RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
144 stars 61 forks source link

Support for nested data? #197

Closed paulmillar closed 10 months ago

paulmillar commented 1 year ago

A little while ago, I came across a very interesting paper, presented at the Knowledge Graph Construction Workshop 2021, called Integrating Nested Data into Knowledge Graphs with RML Fields by Thomas Delva, Dylan Van Assche, Pieter Heyvaert, Ben De Meester and Anastasia Dimou. A recording of Thomas' presentation is also available.

I believe the features that Thomas described are not currently supported in RMLMapper, but could be very useful.

I was wondering if there are any plans to include his work?

DylanVanAssche commented 1 year ago

Hi!

Nested data is already supported in the form of JSONPath or XPath expressions for JSON and XML data. The paper you are referring to give more flexibility and aims to be an uniform expression for any kind of nested data.

I believe the features that Thomas described are not currently supported in RMLMapper, but could be very useful.

True, those features were more like a proposal for the community to solve this problem. They are not implemented. The ideas are currently being discussed in the W3C Knowledge Graph Construction Community Group.

I was wondering if there are any plans to include his work?

Work done on the RMLMapper, RMLStreamer, etc. is solely funded by research projects, the priority of new features are heavily influenced by these projects. Currently, there are no direct plans to implement them, but we welcome any form of collaboration to make this a reality. Feel free to e-mail info@rml.io for collaborations.

paulmillar commented 1 year ago

Hi @DylanVanAssche ,

Thanks for the very quick and informative reply.

As it happens, my interest in Thomas' work on nested data stems from trying to work with JSON and nested data. Therefore, your comment about how nested data is already possible with JSONPath piqued my interest.

Perhaps I'm missing something (a very real possibility!), but when investigating this, I couldn't see how a JSONPath could work for the data I'm trying to process.

Here's a simplified JSON example, to illustrate what I'm trying to do.

[
 {
   "id": "https://example.org/something",
   "name": "The first thing",
   "addresses": [
     {
       "city": "Canberra"
     },
     {
       "city": "London"
     }
   ]
 },

 {
   "id": "https://example.org/another-thing",
   "name": "The second thing",
   "addresses": [
     {
       "city": "New York"
     }
   ]
 }
]

... and here is the corresponding RDF (without the prefixes), which I'm trying to generate with RMLMapper:

<https://example.org/something> a eg:Thing;
    skos:prefLabel "The first thing";
    eg:hasAddress <https://example.org/something/address-1>;
    eg:hasAddress <https://example.org/something/address-2>.

<https://example.org/something/address-1> a eg:Address;
    eg:city "Canberra".

<https://example.org/something/address-2> a eg:Address;
    eg:city "London".

<https://example.org/another-thing> a eg:Thing;
    skos:prefLabel "The second thing";
    eg:hasAddress <https://example.org/another-thing/address-1>.

<https://example.org/another-thing/address-1> a eg:Address;
    eg:city "New York".

I'm using a simple rml:logicalSource, something like:

  rml:logicalSource [
    rml:source "input.json";
    rml:referenceFormulation ql:JSONPath;
    rml:iterator "$.[*]"
  ];

Using this rml:logicalSource, generate the predicates about the top-level items in the JSON (the IRIs of type eg:Thing in the above example) is straight forward.

However, it wasn't clear to me how I could create "new" subject IRIs (e.g., <https://example.org/something/address-1> in the above example) from iterating over a relative JSONPath (e.g., addresses[*] in the above example).

It doesn't seem to be possible with a single rr:TriplesMap. If I've understood correctly, an IRI of type rr:TriplesMap contains exactly one rr:subjectMap predicate, with no possibility of declaring an "inner" rr:TriplesMap.

There also doesn't seem to be possible with multiple rr:TriplesMap IRIs. There doesn't seem to be a way to declare that an IRI of type rr:TriplesMap is (in some sense) "relative" to another IRI of type rr:TriplesMap, so that the first IRI's rml:logicalSource's JSONPath should be executed relative to the second IRI's JSONPath context.

Does this make sense? Am I missing something?

Cheers, Paul.

DylanVanAssche commented 1 year ago

Ah yes, you're hitting limitations of JSONPath I'm afraid... With nested data support in JSONPath I mean that you can map nested data, but as soon you need to link with higher levels in the iterator, you hit a JSONPath limitation as you cannot use go up the in the JSONPath like in XPath can with parent.

For these case you could use Thomas' work yes, if it was implemented.

paulmillar commented 1 year ago

Was Thomas' work purely theoretical: describing how RML might be extended but without providing an implementation?

DylanVanAssche commented 1 year ago

Yes, it was a vision on how this problem could be solved to discuss this with the community. There was no implementation.

paulmillar commented 1 year ago

Ah, OK.

With the data I'm currently working on, this isn't a problem for me (at least, I have a work-around). However, the lack of support for nested data might become a problem in the future: the structure will likely evolve, but it's currently hard to predict in which way.

Although the solution Thomas presented seemed elegant (at least to a non-expert like me), I'm more curious if any solution for handling nested data might become available through RML.

Was there any consensus, from these discussions, on how the community plans to tackle this problem?

DylanVanAssche commented 10 months ago

Was there any consensus, from these discussions, on how the community plans to tackle this problem?

This will be picked again, please join the community group meetings at W3C kg-construct Community Group. I will close this issue for now, because it is related to the RML spec being revised, and not the RMLMapper engine. Thanks!