RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
147 stars 61 forks source link

objects in an array in objects in an array #93

Open justin2004 opened 3 years ago

justin2004 commented 3 years ago

note that i am not concerned about using correct properties yet. i am just concerned with structure.

characters.json:

{
  "characters": [
    {
      "id": "0",
      "firstname": "Ash",
      "items":[ {"id":10,"name":"gloves", "weight":340},
                {"id":11,"name":"sword", "weight":44400}
      ]
    },
    {
      "id": "1",
      "firstname": "Misty",
      "items":[ {"id":12,"name":"gloves", "weight":340},
                {"id":13,"name":"mittens", "weight":300},
                {"id":14,"name":"hat", "weight":800}
      ]
    }
  ]
}

rml:

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix ex: <http://example.org/example/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .

:LogicalSource a rml:logicalSource ;
    rml:source "characters.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.characters[*]" .

:CharactersTriplesMap a rr:TriplesMap;
  rml:logicalSource :LogicalSource .

:CharactersTriplesMap rr:subjectMap [
  rr:template "http://example.org/character/{id}" ;
].

:CharactersTriplesMap rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [
   rr:constant schema:Person
 ]
].

:CharactersTriplesMap rr:predicateObjectMap [
  rr:predicate schema:hasPossessions;
  rr:objectMap [
   rr:parentTriplesMap :TriplesMapItems;
   rr:joinCondition [
                     rr:child  "id"; 
                     rr:parent "id"; 
                    ]
 ]
].

####################

:TriplesMapItems a rr:TriplesMap ;
   rml:logicalSource [
    rml:source "characters.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.characters[*]" ] .

:TriplesMapItems rr:subjectMap [
        rr:template "http://characters.com/posessions/char/{id}/possessions/"; 
].

:TriplesMapItems rr:predicateObjectMap [
  rr:predicate rdf:type;
  rr:objectMap [
   rr:constant schema:Collection
               ] 
].

:TriplesMapItems rr:predicateObjectMap [
  rr:predicate schema:contains;
  rr:objectMap [
   rr:parentTriplesMap :TriplesMapItemsContents;
   rr:joinCondition [
                     rr:child  "items[0].id"; 
                     rr:parent "id"; 
                    ]
 ]
].

##########

:TriplesMapItemsContents a rr:TriplesMap ;
   rml:logicalSource [
    rml:source "characters.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.characters[*].items[*]" ] .

:TriplesMapItemsContents rr:subjectMap [
        rr:template "http://characters.com/items/{id}"; 
].

:TriplesMapItemsContents rr:predicateObjectMap [
  rr:predicate dbo:name;
  rr:objectMap [
    rml:reference "name"
  ]
].

which produces the triples i expect:

<http://characters.com/items/10> <http://dbpedia.org/ontology/name> "gloves".
<http://characters.com/items/11> <http://dbpedia.org/ontology/name> "sword".
<http://characters.com/items/12> <http://dbpedia.org/ontology/name> "gloves".
<http://characters.com/items/13> <http://dbpedia.org/ontology/name> "mittens".
<http://characters.com/items/14> <http://dbpedia.org/ontology/name> "hat".
<http://characters.com/posessions/char/0/possessions/> <http://schema.org/contains> <http://characters.com/items/10>.
<http://characters.com/posessions/char/0/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> <http://characters.com/items/12>.
<http://characters.com/posessions/char/1/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://example.org/character/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person>.
<http://example.org/character/0> <http://schema.org/hasPossessions> <http://characters.com/posessions/char/0/possessions/>.
<http://example.org/character/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person>.
<http://example.org/character/1> <http://schema.org/hasPossessions> <http://characters.com/posessions/char/1/possessions/>.

but notice i had to hardcode to the 0th item:

                     rr:child  "items[0].id"; 

i can manually iterate like:

                     rr:child  "items[1].id"; 

and

                     rr:child  "items[2].id"; 

and it works as expected.

but if i want to do them all at once:

                     rr:child  "items[*].id"; 

then i lose the ?s http://schema.org/contains ?o matching triples:

<http://characters.com/items/10> <http://dbpedia.org/ontology/name> "gloves".
<http://characters.com/items/11> <http://dbpedia.org/ontology/name> "sword".
<http://characters.com/items/12> <http://dbpedia.org/ontology/name> "gloves".
<http://characters.com/items/13> <http://dbpedia.org/ontology/name> "mittens".
<http://characters.com/items/14> <http://dbpedia.org/ontology/name> "hat".
<http://characters.com/posessions/char/0/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/1/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://example.org/character/0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person>.
<http://example.org/character/0> <http://schema.org/hasPossessions> <http://characters.com/posessions/char/0/possessions/>.
<http://example.org/character/1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Person>.
<http://example.org/character/1> <http://schema.org/hasPossessions> <http://characters.com/posessions/char/1/possessions/>.
justin2004 commented 3 years ago

it looks like this example: https://github.com/RMLio/RML-Processor/tree/master/src/test/resources/example5 would do what i am trying to do but i can't get it to run with this new repository.

03:14:40.975 [main] ERROR be.ugent.rml.cli.Main               .main(179) - Unable to parse mapping rules as Turtle. Does the file exist and is it valid Turtle?
thomas-delva commented 3 years ago

Hi Justin,

The reason joining with rr:child "items[*].id" does not work is that the reference "items[*].id" will return multiple values (the id of each item). A join condition expects reference that returns only one value.

You can however use the "items[*].id" reference without a join condition to get the required result. Namely, you can use it directly to create the objects of the schema:contains predicate:

:TriplesMapItems rr:predicateObjectMap [
  rr:predicate schema:contains;
  rr:objectMap [ rr:template "http://characters.com/items/{items[*].id}" ]
].

Which gives the desired output:

<http://characters.com/posessions/char/0/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/0/possessions/> <http://schema.org/contains> <http://characters.com/items/10>.
<http://characters.com/posessions/char/0/possessions/> <http://schema.org/contains> <http://characters.com/items/11>.
<http://characters.com/posessions/char/1/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> <http://characters.com/items/12>.
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> <http://characters.com/items/13>.
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> <http://characters.com/items/14>.

The full mappings I used as well as their input and output are attached:

93.zip

justin2004 commented 3 years ago

hi @thomas-delva , thanks! that does work as long as i have a unique identifier for every object in a child array.

but if my input was:

{
  "characters": [
    {
      "id": "0",
      "firstname": "Ash",
      "items":[ {"name":"gloves", "weight":340},
                {"name":"sword", "weight":44400}
      ]
    },
    {
      "id": "1",
      "firstname": "Misty",
      "items":[ {"name":"gloves", "weight":340},
                {"name":"mittens", "weight":300},
                {"name":"hat", "weight":800}
      ]
    }
  ]
}

and if i want this output:

<http://characters.com/posessions/char/0/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/0/possessions/> <http://schema.org/contains> _:b0 .
<http://characters.com/posessions/char/0/possessions/> <http://schema.org/contains> _:b1 .
<http://characters.com/posessions/char/1/possessions/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Collection>.
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> _:b2 .
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> _:b3 .
<http://characters.com/posessions/char/1/possessions/> <http://schema.org/contains> _:b4 .
_:b0 <http://schema.org/weight> "340".
_:b0 <http://schema.org/name> "gloves".
_:b1 <http://schema.org/weight> "44400".
_:b1 <http://schema.org/name> "sword".
_:b2 <http://schema.org/weight> "340".
_:b2 <http://schema.org/name> "gloves".
_:b3 <http://schema.org/weight> "300".
_:b3 <http://schema.org/name> "mittens".
_:b4 <http://schema.org/weight> "800".
_:b4 <http://schema.org/name> "hat".

how can i achieve that?

thomas-delva commented 3 years ago

The case without a single identifier field for nested items in the child array is currently not possible with RML and JSONPath.

We will update this issue once a solution becomes available within RML.

justin2004 commented 3 years ago

but it seems like this old example is doing something like that.

this line: https://github.com/RMLio/RML-Processor/blob/47644026c41f8a7da3a80a63907bb3040402805b/src/test/resources/example5/museum-model.rml.ttl#L159

picks out the appropriate objects (the nested children) without referencing the template URI: https://github.com/RMLio/RML-Processor/blob/47644026c41f8a7da3a80a63907bb3040402805b/src/test/resources/example5/museum-model.rml.ttl#L75

justin2004 commented 3 years ago

the old repository allowed

rml:iterator "$.[*].Sitter";

even though Sitter is an array.

but with this new repo it seems like you must:

rml:iterator "$.[*].Sitter[*]";

but then you get a cartesian product output which isn't desirable:

<http://ex.com/Neil%20Armstrong> <http://www.w3.org/2000/01/rdf-schema#label> "Neil Armstrong".
<http://ex.com/Buzz%20Aldrin> <http://www.w3.org/2000/01/rdf-schema#label> "Buzz Aldrin".
<http://ex.com/Michael%20Collins> <http://www.w3.org/2000/01/rdf-schema#label> "Michael Collins".
<http://ex.com/Neil%20Armstrong> <http://www.w3.org/2000/01/rdf-schema#label> "Neil Armstrong".
<http://ex.com/Henry%20Larcom%20Abbot> <http://www.w3.org/2000/01/rdf-schema#label> "Henry Larcom Abbot".
<http://ex.com/NPG_70_36> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/NPG_70_36> <http://example.org/example/P62_depicts> <http://ex.com/Buzz%20Aldrin>.
<http://ex.com/NPG_70_36> <http://example.org/example/P62_depicts> <http://ex.com/Michael%20Collins>.
<http://ex.com/NPG_70_36> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/NPG_70_36> <http://example.org/example/P62_depicts> <http://ex.com/Henry%20Larcom%20Abbot>.
<http://ex.com/NPG_70_36> <http://example.org/example/P102_has_title> "Apollo 11 Crew".
<http://ex.com/NPG_70_36> <http://example.org/example/P48_has_preferred_identifier> "NPG_70_36".
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P62_depicts> <http://ex.com/Buzz%20Aldrin>.
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P62_depicts> <http://ex.com/Michael%20Collins>.
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P62_depicts> <http://ex.com/Henry%20Larcom%20Abbot>.
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P102_has_title> "Neil Armstrong".
<http://ex.com/S_NPG_2010_51> <http://example.org/example/P48_has_preferred_identifier> "S_NPG_2010_51".
<http://ex.com/NPG_92_127> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/NPG_92_127> <http://example.org/example/P62_depicts> <http://ex.com/Buzz%20Aldrin>.
<http://ex.com/NPG_92_127> <http://example.org/example/P62_depicts> <http://ex.com/Michael%20Collins>.
<http://ex.com/NPG_92_127> <http://example.org/example/P62_depicts> <http://ex.com/Neil%20Armstrong>.
<http://ex.com/NPG_92_127> <http://example.org/example/P62_depicts> <http://ex.com/Henry%20Larcom%20Abbot>.
<http://ex.com/NPG_92_127> <http://example.org/example/P102_has_title> "Henry Larcom Abbot".
<http://ex.com/NPG_92_127> <http://example.org/example/P48_has_preferred_identifier> "NPG_92_127".

i used this stripped down mapping:

@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <http://example.org/rules/> .
@prefix ex: <http://example.org/example/> .
@prefix schema: <http://schema.org/> .
@prefix dbo: <http://dbpedia.org/ontology/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

:SitterMapping a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "src/test/resources/example5/museum.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*].Sitter[*]";
  ];
  rr:subjectMap [
    rr:template "http://ex.com/{Name}" ; ];

  rr:predicateObjectMap 
  [
    rr:predicate rdfs:label;
    rr:objectMap 
    [
      rml:reference "Name" 
    ]
  ].

:ArtworkMapping a rr:TriplesMap ;
  rml:logicalSource [
    rml:source "src/test/resources/example5/museum.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*]" ] ;

  rr:subjectMap [
    rr:template "http://ex.com/{Ref}";
  ];

  rr:predicateObjectMap 
  [
    rr:predicate ex:P102_has_title;
    rr:objectMap 
    [
      rml:reference "Title" 
    ]
  ];

  rr:predicateObjectMap 
  [
    rr:predicate ex:P48_has_preferred_identifier;
    rr:objectMap 
    [
      rml:reference "Ref" 
    ]
  ];

  rr:predicateObjectMap 
  [
    rr:predicate ex:P62_depicts;
    rr:objectMap [ 
            rr:parentTriplesMap :SitterMapping ;
        ];
  ].
justin2004 commented 3 years ago

if it would help to see an implementation reference: JSON2RDF is able to do this conversion in a "lossless [manner] with the exception of array ordering and some datatype round-tripping."

it produced these triples:

[ <https://example.com/#characters>
          [ <https://example.com/#firstname>
                    "Misty" ;
            <https://example.com/#id>     "1" ;
            <https://example.com/#items>  [ <https://example.com/#name>    "hat" ;
                                            <https://example.com/#weight>  "800"^^<http://www.w3.org/2001/XMLSchema#int>
                                          ] ;
            <https://example.com/#items>  [ <https://example.com/#name>    "mittens" ;
                                            <https://example.com/#weight>  "300"^^<http://www.w3.org/2001/XMLSchema#int>
                                          ] ;
            <https://example.com/#items>  [ <https://example.com/#name>    "gloves" ;
                                            <https://example.com/#weight>  "340"^^<http://www.w3.org/2001/XMLSchema#int>
                                          ]
          ] ;
  <https://example.com/#characters>
          [ <https://example.com/#firstname>
                    "Ash" ;
            <https://example.com/#id>     "0" ;
            <https://example.com/#items>  [ <https://example.com/#name>    "sword" ;
                                            <https://example.com/#weight>  "44400"^^<http://www.w3.org/2001/XMLSchema#int>
                                          ] ;
            <https://example.com/#items>  [ <https://example.com/#name>    "gloves" ;
                                            <https://example.com/#weight>  "340"^^<http://www.w3.org/2001/XMLSchema#int>
                                          ]
          ]
] .

note the resultant triples respect the json ancestry so it is clear that misty's gloves and ash's gloves aren't necessarily the same thing.

though i don't think that repo uses jsonpath it instead uses JsonParser so maybe it isn't useful as a reference.

thomas-delva commented 3 years ago

Hi Justin,

You correctly noticed the current behavior deviates from the older example. The issue with handling nested source data has been identified before by the broader RML community. We raised this issue as as a challenge in the community group and we expect that solutions will be proposed by the end of the month. Once we have our solution, we will notify you to check.

justin2004 commented 3 years ago

thanks for the links, @thomas-delva .

in the mean time i'm going to use bob's approach described here: http://www.bobdc.com/blog/partialschemas/

though i would prefer to use RML.

justin2004 commented 3 years ago

if anyone else has a similar json -> triples need: https://github.com/justin2004/rml-testing