SEMICeu / DCAT-AP

This is the issue tracker for the maintenance of DCAT-AP
https://joinup.ec.europa.eu/solution/dcat-application-profile-data-portals-europe
74 stars 24 forks source link

Switch Recommondations for the Use of dct:format and dcat:mediaType #184

Closed init-dcat-ap-de closed 2 years ago

init-dcat-ap-de commented 3 years ago

dct:format is recommended and dcat:mediaType only optional.

But https://www.w3.org/TR/vocab-dcat/ says:

dcat:mediaType-Usage note: This property SHOULD be used when the media type of the distribution is defined in IANA [IANA-MEDIA-TYPES], otherwise dct:format MAY be used with different values.

and

dct:format-Usage note: dcat:mediaType SHOULD be used if the type of the distribution is defined by IANA [IANA-MEDIA-TYPES].

So it should be the other way around: We should recommend to use dcat:mediaType and dct:format is only the optional fallback, if the file format is not in the IANA list.

init-dcat-ap-de commented 3 years ago

Or is the reason for having dct:format as recommended that it is using the European list? Under this aspect I can understand the "switched" recommondations.

bertvannuffelen commented 3 years ago

@init-dcat-ap-de here is some (historical) context to answer your question.

In the early DCAT-AP discussions on format vs mediatype the following observations were made:

Also DCAT-AP most often considers the human as the prime end-user. Lets consider a person exploring an Open Data Portal. This person will not search for application/xslt+xml to look for XML resources. A filter on xml covering all catalogued resources will be sufficient.

As the PO already had a NAL collecting the most commonly used formats (but also not complete) recommending format over mediatype was natural.

The above interpretation can be formalized as the following template: curl -H "Accept: {dct:mediatype}" {dcat:downloadURL} will return a file in the format {dct:format}

A concrete example: curl -H "Accept: application/xml https://example.com/distribution/3123123 return an XML file but which can be a RDF XML serialization. In that case the format of the data is RDF.

Given this context, I understand the question as both properties look to be overlapping (DCAT 2.0 has made that even more explicit). The above explanation defines distinct purposes. Observe that in this interpretation not providing the mediatype is less problematic as the format. If one only has the mediatype then one has to guess from generic serialization IANAtypes like application/xml or application/json if the format is has a business meaning (e.g. docx is a XML serialization of a word document). This balance between serialization and business format is non-trivial.

bertvannuffelen commented 2 years ago

proposal:

to keep the specification as is.

bertvannuffelen commented 2 years ago

Since there are no objections nor futher improvements suggested, we maintain the specification as is.