SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
76 stars 24 forks source link

Cardinality of dct:format/dcat:mediaType in combination with dcat:package/compressFormat #200

Closed init-dcat-ap-de closed 2 years ago

init-dcat-ap-de commented 3 years ago

Hello,

with the inclusion of dcat:packageFormat and dcat:compressFormat shouldn't it be possible to have multiple dct:format or dcat:mediaTypes?

Otherwise we would imply that all files within a zip have the same format.

jakubklimek commented 3 years ago

Actually, AFAIK this is by design and also applies to dcterms:conformsTo specifying the schema of the distribution. The use case here is to allow e.g. 1 CSV file to be compressed using gzip (compressFormat), or a set of files of same format and schema, split e.g. for size reasons, into multiple files, but packaged as one (packageFormat).

If each packaged file would be different, there is no possibility of properly describing the format and the schema of the individual packaged files with a dcat:Distribution. Therefore, the use case here should not be "zip anything you want into one file and you have a proper distribution".

Also, having multiple formats and schemas would not solve this, as there would be no way of saying which format and which schema goes with which packaged file.

An exception would be if the contents of e.g. the zip file is standardized (e.g. a .docx file is actually a zip file and a package). But then this has a special media type.

Maybe this should be included as a note somewhere in the document?

init-dcat-ap-de commented 3 years ago

While I agree that this is not optimal, it is the reality. If I search for ZIP distributions (https://data.europa.eu/data/datasets?locale=de&format=ZIP) I can find e.g.: https://data.europa.eu/data/datasets/movimento-migratorio-cancellati-dei-cittadini-stranieri-in-anagrafe-per-sesso-anni-2003-2013

This are a collection of different file types. Most "ZIP" files I find are actually Shape files...

jakubklimek commented 3 years ago

Yes. Then the question is whether this is just bad usage of DCAT (which is my opinion) or something we should aim to support. Then we would probably need to make dcat:Distribution way more complex and the question is, whether the publishers currently publishing the zip files would be willing do describe its contents using this more complex approach.

init-dcat-ap-de commented 3 years ago

I am not sure if it would be necessary to make the description way more complex. If the zip-file is a bundle of various files, it will probably never be possible to work with the content automatically, just from the meta-data-description.

jakubklimek commented 3 years ago

Yes, if we would not aim at automatic use / proper description, then it could stay similar. I am however still struggling with saying that this is a supported case. I would rather say "think again about how you structure your data so that it can be described properly".