RMLio / yarrrml-parser

A YARRRML parser library and CLI in Javascript
MIT License
43 stars 17 forks source link

yarrrml-parser: does not parse correctly when a subject map includes target and condition #124

Open SvenLieber opened 3 years ago

SvenLieber commented 3 years ago

Issue type: :bug: Bug

Description

When using version 1.3.0 of the yarrrml-parser and an YARRRML mapping referring to a SPARQL endpoint as logical target, the target is not part of the RML output.

Steps

  1. sudo npm i -g @rmlio/yarrrml-parser
  2. yarrrml-parser --version results in 1.3.0
  3. yarrrml-parser -i mapping.yml -o mapping.ttl

mapping.yml

prefixes:
  ex: "http://example.org/ns#"
  sioc: "http://rdfs.org/sioc/ns#"
  dc: "http://purl.org/dc/elements/1.1/"
mappings:

tweetCreation:
  sources:
    - access: "minimal.json"
       referenceFormulation: jsonpath
       iterator: "$.[*]"
  targets:
    - subject-target: "http://localhost:8080/bigdata/namespace/my-namespace/sparql/~sd"
  s: ex:tweet_$(status.id_str)
  po:
    - [a, sioc:Post]
    - [dc:title, "Tweet $(status.id_str)", en~lang]

the result mapping.ttl, I would expect information about the logical target (all triples should end up in that target)

:rules_000 a void:Dataset;
    void:exampleResource :map_tweetCreation_000.
:map_tweetCreation_000 rml:logicalSource :source_000.
:source_000 a rml:LogicalSource;
    rml:source "minimal.json";
    rml:iterator "$.[*]";
    rml:referenceFormulation ql:JSONPath.
:map_tweetCreation_000 a rr:TriplesMap;
    rdfs:label "tweetCreation".
:s_000 a rr:SubjectMap.
:map_tweetCreation_000 rr:subjectMap :s_000.
:s_000 rr:template "http://example.org/ns/besocial/data#tweet_{status.id_str}".
:pom_000 a rr:PredicateObjectMap.
:map_tweetCreation_000 rr:predicateObjectMap :pom_000.
:pm_000 a rr:PredicateMap.
:pom_000 rr:predicateMap :pm_000.
:pm_000 rr:constant rdf:type.
:pom_000 rr:objectMap :om_000.
:om_000 a rr:ObjectMap;
    rr:constant "http://rdfs.org/sioc/ns#Post";
    rr:termType rr:IRI.
:pom_001 a rr:PredicateObjectMap.
:map_tweetCreation_000 rr:predicateObjectMap :pom_001.
:pm_001 a rr:PredicateMap.
:pom_001 rr:predicateMap :pm_001.
:pm_001 rr:constant dc:title.
:pom_001 rr:objectMap :om_001.
:om_001 a rr:ObjectMap;
    rr:template "Tweet {status.id_str}";
    rr:termType rr:Literal;
    rml:languageMap :language_000.
:language_000 rr:constant "en".

Environment

pheyvaer commented 3 years ago

It works when you put the targets outside of the mapping, as shown here. The example that you give should still work, but I'll leave that to @DylanVanAssche 😉

SvenLieber commented 3 years ago

Thanks, that fixed some of the issue: the logical target is now correctly translated into RML. However, the proposed solution leads to an incorrect RML translation of the subject map when conditions are used. The generated RML contains javascript artifacts which results in an RML error message.

The following mapping

prefixes:
  ex: "http://example.org/ns#"
  sioc: "http://rdfs.org/sioc/ns#"
  dc: "http://purl.org/dc/elements/1.1/"
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"  

targets:
  subject-target: ["http://localhost:8080/bigdata/namespace/my-namespace/sparql~sd"]

mappings:

  tweetsCHOTimelineObject:
    sources:
      - access: "real-life-example-response.json"
        referenceFormulation: jsonpath
        iterator: "$.[*]"
    s:  
      value: bsd:post_twitter_$(id_str)
      targets: subject-target
    condition:
      function: grel:string_contains
      parameters:
        - [grel:valueParameter, "$(warc-header.warc-type)_$(warc-header.warc-target-uri)"]
        - [grel:string_sub, "response_https://api.twitter.com/1.1/statuses/user_timeline.json"]
    po: 
      - [a, sioc:Post]
      - [dc:title, "Tweet $(id_str)", en~lang]

results in RML containing the following javascript artifact:

:pom_003 a rr:PredicateObjectMap;
    rr:predicateMap :pm_003.
:pm_003 a rr:PredicateMap;
    rr:constant idlab-fn:str.
:pom_003 rr:objectMap :om_003.
:om_003 a rr:ObjectMap;
    rr:constant "[object Object]";
    rr:termType rr:IRI.

which eventually leads to the following error during RML mapping:

22:47:51.604 [main] ERROR be.ugent.rml.Executor               .executeWithFunctionV5(166) - The subject "[object Object]" is not a valid IRI. Skipped.
DylanVanAssche commented 3 years ago

Hi @SvenLieber !

When using version 1.3.0 of the yarrrml-parser and an YARRRML mapping referring to a SPARQL endpoint as logical target, the target is not part of the RML output.

Ah that's probably a bug, this should work normally :)

results in RML containing the following javascript artifact:

Yet another bug...

Could you maybe provide a test case what it should be for both bugs?

SvenLieber commented 3 years ago

Hi @DylanVanAssche

an input file for both mappings above which fails would be the following: https://github.com/RMLio/social-media-archiving/blob/master/message-queue-mapper/function/data/message.json

SvenLieber commented 3 years ago

A workaround is generating the RML without specifying a target on the subject map (only defining the target globally) and add the link from SubjectMap to target afterwards to the RML output.

A subject map with a condition will result in the following RML where the SubjectMap is also a FunctionTermMap. The workaround is to append rml:logicalTarget <your-target> to the SubjectMap.


:s_000 a rr:SubjectMap;
    rml:logicalTarget :target_000 .

:map_tweetsCHOTimelineObject_000 rr:subjectMap :s_000.

:s_000 a fnml:FunctionTermMap;
    rr:termType rr:IRI;
    fnml:functionValue :fn_000.