adiwg / mdTranslator

Metadata translation tool built using Ruby
https://www.adiwg.org/mdTranslator/
The Unlicense
14 stars 12 forks source link

DCAT-US writer: Distribution -> Access/Download URL, Media Type #275

Open jwaspin opened 11 months ago

jwaspin commented 11 months ago

distribution

mdJSON source: tbd

Field Name DCAT Name Condition mdJson Source
Distribution dcat:distribution if exists resourceDistribution[any] and if exists resourceDistribution.distributor[any].transferOption[any].onlineOption[any].uri
for each resourceDistribution[0, n] where exists resourceDistribution.distributor.transferOption.onlineOption.uri then
{description, accessURL, downloadURL, mediaType, title}
- Description dcat:distribution.description exists resourceDistribution.description
- AccessURL dcat:distribution.accessURL if citation.onlineResources[first occurence].uri [path ends in ".html"] [required if applicable] resourceDistribution.distributor.transferOption.onlineOption.uri
- DownloadURL dcat.distribution.downloadURL if citation.onlineResources[first occurence].uri [path does not end in ".html"] [required if applicable] resourceDistribution.distributor.transferOption.onlineOption.uri
- MediaType dcat:distribution.mediaType [add codelist of "dataFormat"]
transferOption.distributionFormat.formatSpecification.title [dataFormat] [dataFormat should conform to: https://www.iana.org/assignments/media-types/media-types.xhtml]
- Title dcat:distribution.title exists resourceDistribution.distributor.transferOption.onlineOption.name
dwalt commented 11 months ago

The code is apparently mapping to distribution.distributor[ ].transferOption[ ].distributionFormat.formatSpecification.title. This works, sort of. However, it really is just a citation title, an uncontrolled field that will not necessarily conform to the DCAT media-type formats listing. Citation title doesn't seem an appropriate use for format name. It is a result of an astonishing absence of a distribution format code in ISO. Even CSDGM had at least a "recommended" list.

We can either accept that and users enter whatever they want and we will call it media type even if they entered their dog's name, or we can consider an extended ADIwg codelist to control types. IMO, this is not just an issue for DCAT, but for normalizing format types in our data documentation. This approach would be specific to mdJSON, therefore a potential ISO reader and probably CSDGM would still have to map to format spec.