Closed vemonet closed 3 years ago
Is it not possible to write a module to handle this case? So much of json data contains a list of values.
In our case this issue prevents us to use RML to define mappings to RDF and perform RDF conversion in our stack (running preprocessing to transform all JSON files in a new specific "RML-compliant" JSON remove all interest of using a mapping language)
This issue is at the root of how the RML specifications has been defined, and there have been no discussion on how to solve it, so is it right to not expect this issue will be solved soon?
Is there information about how much time such an issue could take to be solved?
Hi @vemonet and apologies for our silence on this issue!
We agree this feature would be quite easy to implement, it would only require a few line changes around this part of the code.
The trouble however is deciding which behaviour is desirable. Until now we treated JSONPath references as being relative to the current iteration, for example an iterator $.people[*]
and a "relative" reference name
lead to an absolute path like $.people[*].name
. This is consistent with how we treat XPath references. We could change this (as you also suggested here) to treat JSONPath references as absolute on the objects extracted by the iterator (so replace name
with $.name
, a full JSONPath expression). This would allow to use $
as a reference, solving this issue. However, we need to decide whether this is the best solution and it would require clarification in the spec and a solution for backwards-compatibility. These things take some time on our end.
We are working on this issue and will get back to you. In the meantime you might consider enforcing a behaviour that works for you by changing the lines I linked to above and rebuilding the RMLMapper.
More long-term, we are also working on a more general solution for nested data, together with the community group for KG construction, see for example the challenges related to multivalues here.
Hope this answers your concerns!
Follow-up question: what do you think about this solution (i.e., using absolute JSONPath expressions like $.name
, $
,...)? Would it solve all your related issues?
Would it make it too "unconsistent" to keep how it exactly how it is? And just allow to use the "full path" solution with $.
(which will still be based on the relative path of the iteration)
You can consider that with $(name)
you just offer a shortcut for $($.name)
when you access it in the iteration (which is much easier to read and write in 95% of the cases)
This would not increase complexity for most people, and provide full backward compatibility (and it does not seems to introduce conflicts in path definition)
Basically you will be able to continue using $(name)
, you can also be more expressive and use $($.name)
for JSON objects if you really want
But in most case you will just need it for flat arrays
The best of both world!
All you need then is to add 5 lines in the docs showing a comprehensive and complete example that shows exactly how this use-case is handled (such as the ingredient example shared above)
As mentioned in this pull request discussion JSONPath enable to use @
to reference the current node: https://github.com/kg-construct/mapping-challenges/pull/33
But it is not working for the RML mapper, here is a complete example to reproduce it:
ingredients.json
:
[
{
"id": 1,
"ingredients": [
"garlic",
"pepper"
]
},
{
"id": 2,
"ingredients": [
"salt",
"tomatoes"
]
}
]
YARRRML mappings:
prefixes:
grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"
rdfs: "http://www.w3.org/2000/01/rdf-schema#"
fo: "http://purl.org/ontology/fo/"
mappings:
ingredients:
sources:
- ['ingredients.json~jsonpath', "$.[*].ingredients[*]"]
s: fo:ingredient/$(@)
po:
- [rdfs:label, $(@)]
RML mapping (using Matey):
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix map: <http://mapping.example.com/>.
map:map_ingredients_000 rml:logicalSource map:source_000;
a rr:TriplesMap;
rdfs:label "ingredients";
rr:subjectMap map:s_000;
rr:predicateObjectMap map:pom_000.
map:om_000 a rr:ObjectMap;
rml:reference "@";
rr:termType rr:Literal.
map:pm_000 a rr:PredicateMap;
rr:constant rdfs:label.
map:pom_000 a rr:PredicateObjectMap;
rr:predicateMap map:pm_000;
rr:objectMap map:om_000.
map:rules_000 a <http://rdfs.org/ns/void#Dataset>;
<http://rdfs.org/ns/void#exampleResource> map:map_ingredients_000.
map:s_000 a rr:SubjectMap;
rr:template "http://purl.org/ontology/fo/ingredient/{@}".
map:source_000 a rml:LogicalSource;
rml:source "ingredients.json";
rml:iterator "$.[*].ingredients[*]";
rml:referenceFormulation ql:JSONPath.
This does not return a triple (only the prefixes declaration)
I ran it using the latest RML mapper https://github.com/RMLio/rmlmapper-java/releases
java -jar ~/bin/rmlmapper.jar -m mappings.rml.ttl -o output.ttl -s turtle
Is there a plan to support JSONPath in the RML mapper?
Hi @vemonet !
Thanks for providing a test case in https://github.com/RMLio/rmlmapper-java/pull/110 ! We worked on a fix and it will be available in the next release.
This is now included in the latest release v4.9.4! Let us know if you have issues :)
Thanks @DylanVanAssche ! It worked flawlessly on my side
Hi, I have an issue finding how to properly map flat arrays value to triple subjects when I am iterating on the array with RML
Here is a practical example ((ingredients.json`):
I would like to iterate over the ingredients to create subjects, but I cannot find how to reference the value of the array entries (all documentations and tests I have found provide examples for array containing objects that can then be references by the object key)
Here are the YARRRML mappings, I put
???
as placeholder in the mappings:Which gives the corresponding RML, using https://rml.io/yarrrml/matey:
What should be used to reference the actual array value we are iterating on? Instead of
$(???)
and{???}
?