Closed rbartley-uk closed 1 year ago
@rbartley-uk can you explain your use case? As this diagram suggests a lot but does not provide your challenge.
@rbartley-uk can you explain your use case? As this diagram suggests a lot but does not provide your challenge. @bertvannuffelen thanks for responding so quickly.
The arrow at the top of the diagram represents the other DCAT-AP entities, while the area in the dashed blue area represent a particular type of distribution which we refer to as a mashup. For documentation and application automation reasons, we need to hold metadata at each level within this distribution hierarchy and also store the relationship between the entities.
At a simpler conceptual level, we could imagine a situation where we wished to store metadata relating to each data element of a spreadsheet, e.g. datatype. I don't see a way of doing this with DCAT-AP.
Is a mashup an application or a file that you can download?
It is not downloaded as a whole (although individual visualizations can be donwloaded). Here is an example from data.europa.eu: https://data.europa.eu/data/datasets/ef241f7d-b29e-4f02-8520-7d2ceabd66c9?locale=en The first distribution in the list Corona dashboard - DataM URL https://datam.jrc.ec.europa.eu/datam/mashup/CORONA/index.html is a mashup , but in this case no metadata relating to the structure of the mashup is provided through data.europa.eu.
Not sure, if you are trying to solve same issue as we do. Your case looks closer to SDMX data model.
In our case, we also want to describe content of distributions, like classes and properties and map tables, sheets, elements or whatever a Distribution has, to ontologies, via logical data model. Logical data model is also used to automatically generate API of all the distributions, so that, data can be accessed in the same way, even if distributions are provided in different formats.
To track usage, we map datasets, to projects or use cases.
Instead of creating a special kind of distribution, we simply extend DCAT, with additional data, to describe data at single data element (property, column) level.
For statistical data, we use Data Cube Vocabulary.
@rbartley-uk you also might have a look at guidelines on Distributions and Data Services: See https://github.com/SEMICeu/DCAT-AP/blob/master/releases/2.1.1/usageguide-dataset-distribution-dataservice.md
@rbartley-uk, as @sirex shows this is indeed a possible approach (cfr STAT DCAT-AP) that works for data that can be represented in a tabular format. DataCube is a vocabulary to represent cubes structured data. But in general not all data is cube structured shared, but also graph oriented, stream wise, etc.
From a semantic perspective these are all the same (or very closely related), but for applications/developers this is a major distinction. Note that in many cases, one technical structure could be converted into another using some (complex, to be developped) transformation process. But that transformation is not always present, and thus this distinction finds its way in the metadata description.
Specifying these assumptions will make your profiling easier.
@sirex Yes, our use cases seem quite similar. While I see that you are effectively extending DCAT-AP using a Logical Data Model, given that I would have thought that is a fairly common requirement, I wonder whether it would be a good idea to create another extension to DCAT-AP (similar to StatDCAT-AP) or whether this would be better included in v 3 of DCAT-AP. @bertvannuffelen, what is your view on this?
@rbartley-uk I close this issue as we have spoken to each other.
I am currently working on a metadata project where I would like to be able to describe distributions in much more detail than would appear possible under DCAT-AP. I have attached an entity diagram that explains the structure that I would like to describe. Does anyone have any suggestions?
This is my first time posting on GitHub, so apologies if I haven't followed the usual conventions.