Closed init-dcat-ap-de closed 12 months ago
The IANA media-type and extension are encoded as http://publications.europa.eu/ontology/authority/legacy-code
although not all file-types in the list have them.
You may use this query to see which ones do.
Thank you @ODP-hil, this looks very useful. Can you post the SPARQL-Query? So there are already properties for iana type and file extension.
I would love to see them in the current RDF.
Dear @init-dcat-ap-de I am pasting an updated query with the MIME-Type encoded as an xlNotation. This is a mandatory property for all Concepts. the other IANA variables are probably redundant at this point but I left them in the query so you could evaluate which ones to use (EP: https://publications.europa.eu/webapi/rdf/sparql):
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix dct: <http://purl.org/dc/terms/>
prefix euvoc: <http://publications.europa.eu/ontology/euvoc#>
prefix at: <http://publications.europa.eu/ontology/authority/>
prefix dc: <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?OPfileType ?notationIANAType ?ianaMediaType ?ianaCode ?fileExtension
FROM <http://publications.europa.eu/resource/authority/file-type>
WHERE {
?OPfileType skos:inScheme <http://publications.europa.eu/resource/authority/file-type>;
euvoc:xlNotation ?notation.
?notation dct:type <http://publications.europa.eu/resource/authority/notation-type/IANA_MT>;
euvoc:xlCodification ?notationIANAType.
OPTIONAL {?OPfileType dct:conformsTo ?ianaMediaType.}
OPTIONAL {?legacyCodeIana dc:source "mime-type-cellar".}
OPTIONAL {?legacyCodeExtenion dc:source "file-extension". }
OPTIONAL {?OPfileType at:op-mapped-code ?legacyCodeIana.
?legacyCodeIana at:legacy-code ?ianaCode.}
OPTIONAL {?OPfileType at:op-mapped-code ?legacyCodeExtenion.
?legacyCodeExtenion at:legacy-code ?fileExtension.}
}
ORDER BY ?OPfileType
Thank you, so the file extension is within the at:op-mapped-code
-node. For 7Z it is ".7z"
Unfortunately the at:op-mapped-code
-node is a blank node which is not included in the rdf document. We only get:
<ns9:op-mapped-code rdf:nodeID="b123813191" />
So the problem is probably in the export generation of the RDF files.
Yes indeed but you can download the the RDF (either skos or skos-xl) directly from the download tab of the NAL page (https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http://publications.europa.eu/resource/dataset/file-type). In the RDF this is encoded as follows:
<at:op-mapped-code>
<at:MappedCode>
<dc:source>file-extension</dc:source>
<at:legacy-code>.7z</at:legacy-code>
</at:MappedCode>
</at:op-mapped-code>
FYI, in the DCAT SHACL shapes, the content negotation is used to download in the DCAT-AP validator the codelists dynamically (options with full in the name use the imports). See https://github.com/SEMICeu/DCAT-AP/blob/master/releases/2.1.1/dcat-ap_2.1.1_shacl_mdr_imports.ttl
As action from this we might have to check if this list is still up to date.
@init-dcat-ap-de I suppose @ODP-hil explained the organisation of the EU NAL file type and that we can close this exchange.
op-info-helpdesk@publications.europa.eu wrote:
As you can see in the actual source code of the File Type Authority table (can be downloaded from here: File type - EU Vocabularies - Publications Office of the EU (europa.eu)) there are no blank nodes in the data. Consequently you will also have no issue if you access the data by means of SPARQL scripts on the SPARQL endpoint (https://publications.europa.eu/webapi/rdf/sparql). Nevertheless we acknowledge that the way it is displayed by accessing the URI does present blank node. Unfortunately the issue is related with the rendering mechanism of the website. We are aware of the situation and we are looking for solutions to eliminate. Until then please the standard SPARQL endpoint to access the data.
I was looking at the web information about ARC_GZ (as an example): https://op.europa.eu/de/web/eu-vocabularies/concept/-/resource?uri=http://publications.europa.eu/resource/authority/file-type/ARC_GZ
The page shows the mime-type as "application/gzip" and the file extension as ".arc.gz". I was looking for this information within the rdf-representation at http://publications.europa.eu/resource/authority/file-type/ARC_GZ but there I cannot find them. At http://publications.europa.eu/resource/authority/file-type/CSV the PO offers the information that it
dcterms:conformsTo
https://www.iana.org/assignments/media-types/text/csvSo the media type can be found there, but the file extension is still invisible in the rdf. For most (?) file extensions, they are often identical with or can be derived from the data found in e.g.
dc:identifier
. But since the real information exists, this should not be the way to receive those information.I also submitted this issue via the web form. I add it here for further reference and in case anyone has an idea which properties would be a good fit for this use case.