RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
156 stars 61 forks source link

Prefixes not used and full IRI displayed in generated data #233

Open nicolastoira opened 7 months ago

nicolastoira commented 7 months ago

I'm converting JSON files to turtle files with the RMLMapper. In my mapping file I have a long list of prefixes that should be used when the output data is generated. In general this works fine and the generated data is displaying the IRIs with the prefixing applied. Nevertheless, there are some cases where the prefix is not used and the full IRI is reported in the generated data. I was therefore wondering if there is some intrinsic logic that rejects some prefixes compared to others.

For example I have the following prefixes. The first one is correctly replaced while the second is not:

@prefix snomed: <http://snomed.info/id/> .
@prefix obi: <http://purl.obolibrary.org/obo/OBI_> .

Test data:

{
    "content": {
        "sphn:Assay": [
            {
                "sphn:hasCode": {
                    "termid": "OBI-0002188",
                    "iri": "http://purl.obolibrary.org/obo/OBI_0002188",
                    "sourceConceptID": "9a9e4310-8fb8-4bab-a875-4cd37cbd7025"
                }
            },
            {
                "sphn:hasCode": {
                    "termid": "SNOMED-CT-1149430001",
                    "iri": "http://snomed.info/id/1149430001",
                    "sourceConceptID": "9a9e4310-8fb8-4bab-a875-4cd37cbd7025"
                }
            }
        ]
    }
}

Generated data:

resource:PROVIDER-sphn-Assay-9a9e4310-8fb8-4bab-a875-4cd37cbd7025-sphn-Code-OBI-0002188
  a <http://purl.obolibrary.org/obo/OBI_0002188> .

resource:PROVIDER-sphn-Assay-9a9e4310-8fb8-4bab-a875-4cd37cbd7025-sphn-Code-SNOMED-CT-1149430001
  a snomed:1149430001 .

The mapping RML logic is the following:

:sphnAssay_sphnhasCode_rangesphnTerminology a rr:TriplesMap ;
    rml:logicalSource [ rml:iterator "$.content.sphn:Assay[*].sphn:hasCode" ;
            rml:referenceFormulation ql:JSONPath ;
            rml:source "patient_data_input.json" ] ;
    rr:predicateObjectMap [ rr:objectMap [ rml:reference "iri" ;
                    rr:termType rr:IRI ] ;
            rr:predicate rdf:type ] ;
    rr:subjectMap [ rr:template "resource:PROVIDER-sphn-Assay-{sourceConceptID}-sphn-Code-{termid}" ] .

As you can see, even if the prefix is defined in the RML mapping file, we get <http://purl.obolibrary.org/obo/OBI_0002188> while the expected result should be obi:0002188. If I modify the prefix to something like this @prefix obi: <http://purl.obolibrary.org/obo/OBI/> . and change the input data to "iri": "http://purl.obolibrary.org/obo/OBI/0002188" it works as expected.

Do you see any issues with the prefix definition or is there any logic in the RML mapper that blocks the correct replacement of the namespace prefix? Thank you.

DylanVanAssche commented 7 months ago

Hi!

Do you see any issues with the prefix definition or is there any logic in the RML mapper that blocks the correct replacement of the namespace prefix? Thank you.

With the RML mapping you provided, you seem to try to make a Turtle shortcut with an rr:template: "resource:PROVIDER-sphn-Assay-{sourceConceptID}-sphn-Code-{termid}". This won't work for other RDF serializations as it is not a proper IRI.

we get http://purl.obolibrary.org/obo/OBI_0002188 while the expected result should be obi:0002188.

The RDF library inside the RMLMapper generates Turtle in a certain way, it does not (always) use shortcuts that are available in the Turtle language. Unfortunately, that's related to the Turtle specification, it does not have a proper way to say: 'use shortcut X' or 'expand Y'. It allows these shortcuts and it is up to the implementation to pick one.

If I modify the prefix to something like this @prefix obi: http://purl.obolibrary.org/obo/OBI/ . and change the input data to "iri": "http://purl.obolibrary.org/obo/OBI/0002188" it works as expected.

Sometimes the RDF library picks up the original prefixes from the mapping, but not always. It depends on how it resolves the RDF triples which we do not have control over. Although, I find it a bit weird to have this prefix:

@prefix obi: <http://purl.obolibrary.org/obo/OBI_> .

The part OBI_ is also part of the IRI which is a bit weird. Normally a prefix ends with # or /.