Distribution subentities

SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP

https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe

72 stars 24 forks source link

Distribution subentities #261

Closed rbartley-uk closed 1 year ago

rbartley-uk commented 1 year ago

I am currently working on a metadata project where I would like to be able to describe distributions in much more detail than would appear possible under DCAT-AP. I have attached an entity diagram that explains the structure that I would like to describe. Does anyone have any suggestions?

This is my first time posting on GitHub, so apologies if I haven't followed the usual conventions.

Mashup Distribution

bertvannuffelen commented 1 year ago

@rbartley-uk can you explain your use case? As this diagram suggests a lot but does not provide your challenge.

rbartley-uk commented 1 year ago

@rbartley-uk can you explain your use case? As this diagram suggests a lot but does not provide your challenge. @bertvannuffelen thanks for responding so quickly.

The arrow at the top of the diagram represents the other DCAT-AP entities, while the area in the dashed blue area represent a particular type of distribution which we refer to as a mashup. For documentation and application automation reasons, we need to hold metadata at each level within this distribution hierarchy and also store the relationship between the entities.

At a simpler conceptual level, we could imagine a situation where we wished to store metadata relating to each data element of a spreadsheet, e.g. datatype. I don't see a way of doing this with DCAT-AP.

bertvannuffelen commented 1 year ago

Is a mashup an application or a file that you can download?

rbartley-uk commented 1 year ago

It is not downloaded as a whole (although individual visualizations can be donwloaded). Here is an example from data.europa.eu: https://data.europa.eu/data/datasets/ef241f7d-b29e-4f02-8520-7d2ceabd66c9?locale=en The first distribution in the list Corona dashboard - DataM URL https://datam.jrc.ec.europa.eu/datam/mashup/CORONA/index.html is a mashup , but in this case no metadata relating to the structure of the mashup is provided through data.europa.eu.

sirex commented 1 year ago

Not sure, if you are trying to solve same issue as we do. Your case looks closer to SDMX data model.

In our case, we also want to describe content of distributions, like classes and properties and map tables, sheets, elements or whatever a Distribution has, to ontologies, via logical data model. Logical data model is also used to automatically generate API of all the distributions, so that, data can be accessed in the same way, even if distributions are provided in different formats.

To track usage, we map datasets, to projects or use cases.

Instead of creating a special kind of distribution, we simply extend DCAT, with additional data, to describe data at single data element (property, column) level.

For statistical data, we use Data Cube Vocabulary.

bertvannuffelen commented 1 year ago

@rbartley-uk you also might have a look at guidelines on Distributions and Data Services: See https://github.com/SEMICeu/DCAT-AP/blob/master/releases/2.1.1/usageguide-dataset-distribution-dataservice.md

bertvannuffelen commented 1 year ago

@rbartley-uk, as @sirex shows this is indeed a possible approach (cfr STAT DCAT-AP) that works for data that can be represented in a tabular format. DataCube is a vocabulary to represent cubes structured data. But in general not all data is cube structured shared, but also graph oriented, stream wise, etc.

From a semantic perspective these are all the same (or very closely related), but for applications/developers this is a major distinction. Note that in many cases, one technical structure could be converted into another using some (complex, to be developped) transformation process. But that transformation is not always present, and thus this distinction finds its way in the metadata description.

Specifying these assumptions will make your profiling easier.

rbartley-uk commented 1 year ago

@sirex Yes, our use cases seem quite similar. While I see that you are effectively extending DCAT-AP using a Logical Data Model, given that I would have thought that is a fairly common requirement, I wonder whether it would be a good idea to create another extension to DCAT-AP (similar to StatDCAT-AP) or whether this would be better included in v 3 of DCAT-AP. @bertvannuffelen, what is your view on this?

bertvannuffelen commented 1 year ago

@rbartley-uk I close this issue as we have spoken to each other.