RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
147 stars 61 forks source link

How to use the values of a flat array when iterating on it with RML? #95

Closed vemonet closed 3 years ago

vemonet commented 3 years ago

Hi, I have an issue finding how to properly map flat arrays value to triple subjects when I am iterating on the array with RML

Here is a practical example ((ingredients.json`):

[
  {
    "id": 1,
    "ingredients": [
      "garlic",
      "pepper"
    ]
  },
  {
    "id": 2,
    "ingredients": [
      "salt",
      "tomatoes"
    ]
  }
]

I would like to iterate over the ingredients to create subjects, but I cannot find how to reference the value of the array entries (all documentations and tests I have found provide examples for array containing objects that can then be references by the object key)

Here are the YARRRML mappings, I put ??? as placeholder in the mappings:

prefixes:
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  fo: "http://purl.org/ontology/fo/"
mappings:
  ingredients:
    sources:
      - ['ingredients.json~jsonpath', "$.[*].ingredients[*]"]
    s: fo:ingredient/$(???)
    po:
      - [rdfs:label, $(???)]

Which gives the corresponding RML, using https://rml.io/yarrrml/matey:

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix map: <http://mapping.example.com/>.

map:map_ingredients_0 rml:logicalSource map:source_0;
    a rr:TriplesMap;
    rdfs:label "ingredients";
    rr:subjectMap map:s_0;
    rr:predicateObjectMap map:pom_0.
map:om_0 a rr:ObjectMap;
    rml:reference "???";
    rr:termType rr:Literal.
map:pm_0 a rr:PredicateMap;
    rr:constant rdfs:label.
map:pom_0 a rr:PredicateObjectMap;
    rr:predicateMap map:pm_0;
    rr:objectMap map:om_0.
map:s_0 a rr:SubjectMap;
    rr:template "http://purl.org/ontology/fo/ingredient/{???}".
map:source_0 a rml:LogicalSource;
    rml:source "ingredients.json";
    rml:iterator "$.[*].ingredients[*]";
    rml:referenceFormulation ql:JSONPath.

What should be used to reference the actual array value we are iterating on? Instead of $(???) and {???}?

micheldumontier commented 3 years ago

Is it not possible to write a module to handle this case? So much of json data contains a list of values.

vemonet commented 3 years ago

In our case this issue prevents us to use RML to define mappings to RDF and perform RDF conversion in our stack (running preprocessing to transform all JSON files in a new specific "RML-compliant" JSON remove all interest of using a mapping language)

This issue is at the root of how the RML specifications has been defined, and there have been no discussion on how to solve it, so is it right to not expect this issue will be solved soon?

Is there information about how much time such an issue could take to be solved?

thomas-delva commented 3 years ago

Hi @vemonet and apologies for our silence on this issue!

We agree this feature would be quite easy to implement, it would only require a few line changes around this part of the code.

The trouble however is deciding which behaviour is desirable. Until now we treated JSONPath references as being relative to the current iteration, for example an iterator $.people[*] and a "relative" reference name lead to an absolute path like $.people[*].name. This is consistent with how we treat XPath references. We could change this (as you also suggested here) to treat JSONPath references as absolute on the objects extracted by the iterator (so replace name with $.name, a full JSONPath expression). This would allow to use $ as a reference, solving this issue. However, we need to decide whether this is the best solution and it would require clarification in the spec and a solution for backwards-compatibility. These things take some time on our end.

We are working on this issue and will get back to you. In the meantime you might consider enforcing a behaviour that works for you by changing the lines I linked to above and rebuilding the RMLMapper.

More long-term, we are also working on a more general solution for nested data, together with the community group for KG construction, see for example the challenges related to multivalues here.

Hope this answers your concerns!

thomas-delva commented 3 years ago

Follow-up question: what do you think about this solution (i.e., using absolute JSONPath expressions like $.name, $,...)? Would it solve all your related issues?

vemonet commented 3 years ago

Would it make it too "unconsistent" to keep how it exactly how it is? And just allow to use the "full path" solution with $. (which will still be based on the relative path of the iteration)

You can consider that with $(name) you just offer a shortcut for $($.name) when you access it in the iteration (which is much easier to read and write in 95% of the cases)

This would not increase complexity for most people, and provide full backward compatibility (and it does not seems to introduce conflicts in path definition)

Basically you will be able to continue using $(name), you can also be more expressive and use $($.name) for JSON objects if you really want

But in most case you will just need it for flat arrays

The best of both world!

All you need then is to add 5 lines in the docs showing a comprehensive and complete example that shows exactly how this use-case is handled (such as the ingredient example shared above)

vemonet commented 3 years ago

As mentioned in this pull request discussion JSONPath enable to use @ to reference the current node: https://github.com/kg-construct/mapping-challenges/pull/33

But it is not working for the RML mapper, here is a complete example to reproduce it:

ingredients.json:

[
    {
      "id": 1,
      "ingredients": [
        "garlic",
        "pepper"
      ]
    },
    {
      "id": 2,
      "ingredients": [
        "salt",
        "tomatoes"
      ]
    }
  ]

YARRRML mappings:

prefixes:
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"
  rdfs: "http://www.w3.org/2000/01/rdf-schema#"
  fo: "http://purl.org/ontology/fo/"
mappings:
  ingredients:
    sources:
      - ['ingredients.json~jsonpath', "$.[*].ingredients[*]"]
    s: fo:ingredient/$(@)
    po:
      - [rdfs:label, $(@)]

RML mapping (using Matey):

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix map: <http://mapping.example.com/>.

map:map_ingredients_000 rml:logicalSource map:source_000;
    a rr:TriplesMap;
    rdfs:label "ingredients";
    rr:subjectMap map:s_000;
    rr:predicateObjectMap map:pom_000.
map:om_000 a rr:ObjectMap;
    rml:reference "@";
    rr:termType rr:Literal.
map:pm_000 a rr:PredicateMap;
    rr:constant rdfs:label.
map:pom_000 a rr:PredicateObjectMap;
    rr:predicateMap map:pm_000;
    rr:objectMap map:om_000.
map:rules_000 a <http://rdfs.org/ns/void#Dataset>;
    <http://rdfs.org/ns/void#exampleResource> map:map_ingredients_000.
map:s_000 a rr:SubjectMap;
    rr:template "http://purl.org/ontology/fo/ingredient/{@}".
map:source_000 a rml:LogicalSource;
    rml:source "ingredients.json";
    rml:iterator "$.[*].ingredients[*]";
    rml:referenceFormulation ql:JSONPath.

This does not return a triple (only the prefixes declaration)

I ran it using the latest RML mapper https://github.com/RMLio/rmlmapper-java/releases

java -jar ~/bin/rmlmapper.jar -m mappings.rml.ttl -o output.ttl -s turtle

Is there a plan to support JSONPath in the RML mapper?

DylanVanAssche commented 3 years ago

Hi @vemonet !

Thanks for providing a test case in https://github.com/RMLio/rmlmapper-java/pull/110 ! We worked on a fix and it will be available in the next release.

DylanVanAssche commented 3 years ago

This is now included in the latest release v4.9.4! Let us know if you have issues :)

vemonet commented 3 years ago

Thanks @DylanVanAssche ! It worked flawlessly on my side