co-cddo / ukgov-metadata-exchange-model

A metadata model for describing data assets for exchanging between UK government organisations.
https://co-cddo.github.io/ukgov-metadata-exchange-model/
Other
13 stars 1 forks source link

Capture the schema of the underlying Dataset #4

Open AlasdairGray opened 1 year ago

AlasdairGray commented 1 year ago

The schema of a Dataset helps a technical acquirer to understand and assess the data.

Extend the exchange model to capture the schema of the Dataset.

DCAT recommends using the dct:conformsTo property for capturing the schema of the Dataset, see §6.4.2 of DCATv3

AlasdairGray commented 1 year ago

DCAT does not give any guidance on how to capture the schema of the underlying dataset. We will need to support a wide variety of dataset formats including CSV, JSON, geoJSON, and XML.

To enable applications such as the Data Marketplace to be able to exploit the schema level information, it would be beneficial to have an agreed approach, but this is likely to be different depending upon the dataset media type.

For tabular data there is a government recommendation to use CSV to share this data and also a recommendation to use CSVW (CSV for the Web) to capture the metadata. CSVW is a recommendation for sharing CSV files and is capable of modelling the column headings and relationships between them. This would allow for the use of CVSW processing tools to manipulate the metadata.

For XML and JSON, there exist XML Schema and JSON Schema respectively. These can be published on the web and the dct:conformsTo property could link to the file (or we could investigate embedding it within the metadata). The schema information can then be processed using standard tooling available in multiple languages. This approach also means that the metadata publisher will not need to do additional modelling for their schema level metadata.