dbpedia / archivo

DBpedia Archivo - Augmented Ontology Archive powered by Databus
https://archivo.dbpedia.org/
GNU Affero General Public License v3.0
17 stars 6 forks source link

Base Schema with fragment / hash in parsed turtle files leads to incorrect IRI resolution for some parsers #47

Open Lenostatos opened 1 month ago

Lenostatos commented 1 month ago

Hello,

I tried to parse the turtle of your RDF Schema ontology file at https://databus.dbpedia.org/ontologies/w3.org/2000--01--rdf-schema/2020.06.10-215336/2000--01--rdf-schema_type=parsed.ttl with a library that uses the N3.js parser.

There, I ran into a problem with the base IRI resolution. I opened an issue with the maintainers of that library, and it seems that there might be an error in the RDF Schema turtle file: https://github.com/rdfjs-base/parser-n3/issues/15

The problem is in the beginning of the turtle code:

@base <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <../../1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <> .
@prefix owl: <../../2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .

<>
    dc:title "The RDF Schema vocabulary (RDFS)" ;
    a owl:Ontology ;
    rdfs:seeAlso <rdf-schema-more> .

rdfs:Class
    a rdfs:Class ;
    rdfs:comment "The class of classes." ;
    rdfs:isDefinedBy <> ;
    rdfs:label "Class" ;
    rdfs:subClassOf rdfs:Resource .

The hash symbol (#) at the end of the base IRI is apparently stripped by the parser (?) and the resulting triples then contain invalid IRIs: 334765001-5b74cd02-0fc9-4d36-89e8-d6b70e738fef

Unfortunately, I don't have time right now to look too deeply into whether your turtle or the N3.js implementation is correct but I at least wanted to let you know about the issue.

JJ-Author commented 1 month ago

really interesting feedback - thank you very much. we use the raptor-utility rapper for parsing that creates this kind of prefix preamble. but indeed reusing the base prefix when defining the rdfs prefix adds more complexity (than actually needed though) so in terms of a more reliable parsing i think it would be better to not have these relative IRIs in the prefix definition. but even when fixing this, still the first 3 triples were left broken...

I tried it with at tool based on RDFLib and it works https://rdftools.ga.gov.au/convert as expected but this also has the same issue. I will have a deeper look again to see what actually is correct and what could be done about it.

workaround at the moment:

Lenostatos commented 1 month ago

Thank you very much for looking into this @JJ-Author ! And also for the tip with the .nt files. That really helps 😄