RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

Best practice to create unique IRIs for nested json data #162

Closed tjroamer closed 11 months ago

tjroamer commented 2 years ago

I am having difficulty to construct IRIs for my nested json data. The example json is:

[
    {
      "id": "A1",
      "attr": "value",
      "components": [
        {
           "id": "B1",
           "attr": "value",
           "components": [
              {
                 "id": "C1",
                 "attr": "value"
              },
              {
                 "id": "C2",
                 "attr": "value"
              }
           ]
        },
        {
           "id": "B2",
           "attr": "value"
        }
      ]
    }
]

The nested level can be very high. I want to write a generic RML rule that can generate triples like this:

###### EXPECTED:

<http://data/A1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.
<http://data/A1> <http://model/child> <http://data/A1/B1>.
<http://data/A1> <http://model/child> <http://data/A1/B2>.

<http://data/A1/B1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.
<http://data/A1/B1> <http://model/child> <http://data/A1/B1/C1>.
<http://data/A1/B1> <http://model/child> <http://data/A1/B1/C2>.

<http://data/A1/B1/C1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.
<http://data/A1/B1/C2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.

I ended up with writing rules for each level, but still I have the problem to generate IRIs.


@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix iasis: <http://project-iasis.eu/vocab/> .
@prefix model: <http://model/> .
@prefix data: <http://data/> .
@base <http://project-iasis.eu/> .

<LevelAMap>
  a rr:TriplesMap;

  rml:logicalSource [
    rml:source "example.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*]"
  ];
  rr:subjectMap [
    rr:template "http://data/{id}";
    rr:class model:Component;
  ];
  rr:predicateObjectMap [
    rr:predicate model:child;
    rr:objectMap [
      rr:template "http://data/{id}/{components[*].id}";
      rr:termType rr:IRI
   ]
  ];
.

<LevelBMap>
  a rr:TriplesMap;

  rml:logicalSource [
    rml:source "example.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*].components[*]"
  ];
  rr:subjectMap [
    rr:template "http://data/{id}";  ########### Problem to generate multi-level IRI!!
    rr:class model:Component;
  ];
  rr:predicateObjectMap [
    rr:predicate model:child;
    rr:objectMap [
      rr:template "http://data/{id}/{components[*].id}";
      rr:termType rr:IRI
   ]
  ];
.

But with the above mapping, I get the following result:

<http://data/A1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.
<http://data/A1> <http://model/child> <http://data/A1/B1>.
<http://data/A1> <http://model/child> <http://data/A1/B2>.
<http://data/B1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.
<http://data/B1> <http://model/child> <http://data/B1/C1>.
<http://data/B1> <http://model/child> <http://data/B1/C2>.
<http://data/B2> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://model/Component>.

I think this pattern is quite common in structured data. I am wondering whether there is a solution to deal with this kind of data to generate linked data with unique IRI. Any suggestions are much appreciated. Thanks.

DylanVanAssche commented 2 years ago

Hi!

I'm afraid that this is a limitation of the JSONPath expressions. This iterator: rml:iterator "$.[*].components[*]" denies access to the higher level ID you want.

You could try to use multiple Triple Maps (one for each level IRI) and then use a join condition to link them with each other through the model:child predicate?

DylanVanAssche commented 11 months ago

No response for a while, re-open if needed.