RMLio / yarrrml-parser

A YARRRML parser library and CLI in Javascript
MIT License
41 stars 17 forks source link

Cross-product of links between instances #192

Closed namedgraph closed 1 year ago

namedgraph commented 1 year ago

Issue type: :bug: Bug

sources:
  Smth:
    access: input.ndjson
    referenceFormulation: jsonpath
    iterator: "$.key[*]"
mappings:
  Concept:
    sources: Smth
    graph: smth:$(@)
    s: smth:$(@)#this
    po:
      - [ a, skos:Concept ]
      - [ skos:prefLabel, $(@) ]
  Document:
    sources: Smth
    graph: smth:$(@)
    s: smth:$(@)
    po:
      - [ a, foaf:Document ]
      - p: foaf:primaryTopic
        o:
          - mapping: Concept

I am getting a cross-product in the output, i.e. if there are N rows, I'm getting this Document output:

<instance1> foaf:primaryTopic <instance1#this> .
<instance1> foaf:primaryTopic <instance2#this> .
...
<instance1> foaf:primaryTopic <instanceN#this> .
<instance2> foaf:primaryTopic <instance1#this> .
<instance2> foaf:primaryTopic <instance2#this> .
...
<instance2> foaf:primaryTopic <instanceN#this> .
...
<instanceN> foaf:primaryTopic <instance1#this> .
<instanceN> foaf:primaryTopic <instance2#this> .
...
<instanceN> foaf:primaryTopic <instanceN#this> .

Where I only want to simply "pair" respective Document and Concept instances:

<instance1> foaf:primaryTopic <instance1#this> .
<instance2> foaf:primaryTopic <instance2#this> .
...
<instanceN> foaf:primaryTopic <instanceN#this> .

Is there a way to express what I need with YARRRML?

pheyvaer commented 1 year ago

Hi @namedgraph, how does your input data look like?

namedgraph commented 1 year ago

One row looks like this:

{"key":["aaaaaa","bbbbbbbb","cccc","ddddddd"]}

I added sources to the mapping BTW.

pheyvaer commented 1 year ago

It's only possible if there is a unique way to identify a row and the only link the rows that are the same. For example, if the every row has a index you can add a condition to your mapping so that it only links when the indexes of the rows as equal.

namedgraph commented 1 year ago

I see... I might need to add that.

These are annoying shortcomings IMO. It would not be a problem using XSLT, for example.

namedgraph commented 1 year ago

But wait... I would need a different iterator then?

pheyvaer commented 1 year ago

No, that is not needed.

namedgraph commented 1 year ago

But in the mapping I'm using array item (e.g. "cccc") as the value: $(@) It's those values I need to compare, not row IDs. If I was to add IDs for those values, I would need to change the whole JSON structure within the array?

pheyvaer commented 1 year ago

No, that's not needed. The following works for me

sources:
  Smth:
    access: data.json
    referenceFormulation: jsonpath
    iterator: "$.key[*]"
mappings:
  Concept:
    sources: Smth
    s: ex:$(@)#this
    po:
      - [ a, skos:Concept ]
      - [ skos:prefLabel, $(@) ]
  Document:
    sources: Smth
    s: ex:$(@)
    po:
      - [ a, foaf:Document ]
      - p: foaf:primaryTopic
        o:
          - mapping: Concept
            condition:
              function: equal
              parameters:
                - [str1, $(@)]
                - [str2, $(@)]
namedgraph commented 1 year ago

Thanks! Will try.

I considered this, but it wasn't obvious to me how comparing $(@) to $(@) could ever be false?

pheyvaer commented 1 year ago

We compare every element in the array (via Concept) with every element in the array (via Document). So we have

namedgraph commented 1 year ago

That part I understand. But doesn't that mean that $(@) in str1 refers to a different value than $(@) in str2?

pheyvaer commented 1 year ago

In str1 we refer to the rows in Document and in str2 we refer to the rows in Concept.

namedgraph commented 1 year ago

That explains the result, and it will be useful in my case. But my point is that it's counter-intuitive and unusual for the same variable ($(@)) to refer to different values in the same context.

namedgraph commented 1 year ago

I just tried your suggestion with condition and it doesn't work for me -- the result is the same as without it. Are you sure you tested with more than one row of JSONL?

pheyvaer commented 1 year ago

Well, it's not the same context actually, but for equal the context has a default if the user doesn't provide one. This is explained here:

But when a condition is used an extra value can be given to a parameter of a function. This is either s or o. s means that the value of the parameter is coming from the subject of the relationship, while o means that the value is coming from the object of the relationship. The default value is s. In this example it would result in relationships between every person and their projects.

pheyvaer commented 1 year ago

Regarding JSONL, by default only standard JSON is supported.

namedgraph commented 1 year ago

Disregard the JSONL comment...

I managed to reproduce your condition-based results using the rmlio/rmlmapper-java Docker image, but not in the Java code (using be.ugent.rml:rmlmapper:6.1.3) 🤔

namedgraph commented 1 year ago

Turns out it's a bug in our custom executor 😅 Sorry for the noise.