RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
156 stars 61 forks source link

ClassCastException in boolean_and for string_contains #112

Open SvenLieber opened 3 years ago

SvenLieber commented 3 years ago

Hi,

in one of my mappings I have a boolean conjunction as condition: a subject should only be created if the field warc-header.warc-type has the string value response and if the string of the field warc-header.warc-target-uri contains the substring show.

However, I get a java.lang.ClassCastException as shown belong when using the results of the function grel:string_contains in grel:boolean_and. Attached are a minimal example in both YARRRML and RML as well as example data. I used the RMLMapper in version 4.9.1 and the YARRRML-parser in version 1.1.1

Error message

16:56:21.411 [main] DEBUG c.j.j.internal.path.CompiledPath    .evaluate(47) - Evaluating path: $[*]
16:56:21.449 [main] DEBUG c.j.j.internal.path.CompiledPath    .evaluate(47) - Evaluating path: $[0]['warc-header']['warc-type']
16:56:21.552 [main] DEBUG c.j.j.internal.path.CompiledPath    .evaluate(47) - Evaluating path: $[0]['warc-header']['warc-target-uri']
java.lang.reflect.InvocationTargetException
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:567)
    at be.ugent.rml.functions.FunctionModel.execute(FunctionModel.java:44)
    at be.ugent.rml.functions.DynamicMultipleRecordsFunctionExecutor.execute(DynamicMultipleRecordsFunctionExecutor.java:83)
    at be.ugent.rml.functions.AbstractSingleRecordFunctionExecutor.execute(AbstractSingleRecordFunctionExecutor.java:16)
    at be.ugent.rml.termgenerator.LiteralGenerator.generate(LiteralGenerator.java:54)
    at be.ugent.rml.functions.DynamicMultipleRecordsFunctionExecutor.lambda$execute$1(DynamicMultipleRecordsFunctionExecutor.java:43)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
    at be.ugent.rml.functions.DynamicMultipleRecordsFunctionExecutor.lambda$execute$4(DynamicMultipleRecordsFunctionExecutor.java:41)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
    at be.ugent.rml.functions.DynamicMultipleRecordsFunctionExecutor.execute(DynamicMultipleRecordsFunctionExecutor.java:28)
    at be.ugent.rml.functions.AbstractSingleRecordFunctionExecutor.execute(AbstractSingleRecordFunctionExecutor.java:16)
    at be.ugent.rml.termgenerator.NamedNodeGenerator.generate(NamedNodeGenerator.java:22)
    at be.ugent.rml.Executor.getSubject(Executor.java:299)
    at be.ugent.rml.Executor.executeWithFunction(Executor.java:95)
    at be.ugent.rml.Executor.execute(Executor.java:78)
    at be.ugent.rml.cli.Main.main(Main.java:289)
    at be.ugent.rml.cli.Main.main(Main.java:36)
Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class java.lang.Boolean (java.lang.String and java.lang.Boolean are in module java.base of loader 'bootstrap')
    at io.fno.grel.BooleanFunctions.and(BooleanFunctions.java:16)
    ... 20 more

YARRRML mapping rules

prefixes:
  idlab-fn: "http://example.com/idlab/function/"
  grel: "http://users.ugent.be/~bjdmeest/function/grel.ttl#"  
  owl: "http://www.w3.org/2002/07/owl#"
  prov: "http://www.w3.org/ns/prov#"
  xsd: "http://www.w3.org/2001/XMLSchema#"
  ex: "http://example.org/ns#"

mappings:

  myMapping:
    sources:
      - access: "failing-message.json"
        referenceFormulation: jsonpath
        iterator: "$.[*]"
    s: ex:myMessage_$(status.id_str)
    condition:
      function: grel:boolean_and
      parameters:
        - parameter: grel:param_rep_b
          value:
            function: grel:string_contains
            parameters:
              - [grel:valueParameter, $(warc-header.warc-type)]
              - [grel:string_sub, "response"]
            datatype: xsd:boolean
        - parameter: grel:param_rep_b
          value:
            function: grel:string_contains
            parameters:
              - [grel:valueParameter, $(warc-header.warc-target-uri)]
              - [grel:string_sub, "show"]
            datatype: xsd:boolean
    po: 
      - [a, prov:Collection]
      - [rdfs:label, "A response containing the string 'show'", en~lang] 

RML mapping rules

@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#>.
@prefix fno: <https://w3id.org/function/ontology#>.
@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix : <http://mapping.example.com/>.
@prefix idlab-fn: <http://example.com/idlab/function/>.
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix prov: <http://www.w3.org/ns/prov#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix ex: <http://example.org/ns#>.

:map_myMapping_0 rml:logicalSource :source_0.
:source_0 a rml:LogicalSource;
    rml:source "failing-message.json";
    rml:iterator "$.[*]";
    rml:referenceFormulation ql:JSONPath.
:map_myMapping_0 a rr:TriplesMap;
    rdfs:label "myMapping".
:s_0 a rr:SubjectMap.
:map_myMapping_0 rr:subjectMap :s_0.
:s_0 a fnml:FunctionTermMap;
    rr:termType rr:IRI;
    fnml:functionValue :fn_0.
:fn_0 rml:logicalSource :source_0;
    rr:predicateObjectMap :pomexec_0.
:pomexec_0 rr:predicateMap :pmexec_0.
:pmexec_0 rr:constant fno:executes.
:pomexec_0 rr:objectMap :omexec_0.
:omexec_0 rr:constant "http://example.com/idlab/function/trueCondition";
    rr:termType rr:IRI.
:fn_0 rr:predicateObjectMap :pom_0.
:pom_0 a rr:PredicateObjectMap;
    rr:predicateMap :pm_0.
:pm_0 a rr:PredicateMap;
    rr:constant idlab-fn:strBoolean.
:pom_0 rr:objectMap :om_0.
:om_0 a rr:ObjectMap, fnml:FunctionTermMap;
    fnml:functionValue :fn_1.
:fn_1 rml:logicalSource :source_0;
    rr:predicateObjectMap :pomexec_1.
:pomexec_1 rr:predicateMap :pmexec_1.
:pmexec_1 rr:constant fno:executes.
:pomexec_1 rr:objectMap :omexec_1.
:omexec_1 rr:constant "http://users.ugent.be/~bjdmeest/function/grel.ttl#boolean_and";
    rr:termType rr:IRI.
:fn_1 rr:predicateObjectMap :pom_1.
:pom_1 a rr:PredicateObjectMap;
    rr:predicateMap :pm_1.
:pm_1 a rr:PredicateMap;
    rr:constant grel:param_rep_b.
:pom_1 rr:objectMap :om_1.
:om_1 a rr:ObjectMap, fnml:FunctionTermMap;
    rr:datatype xsd:boolean;
    fnml:functionValue :fn_2.
:fn_2 rml:logicalSource :source_0;
    rr:predicateObjectMap :pomexec_2.
:pomexec_2 rr:predicateMap :pmexec_2.
:pmexec_2 rr:constant fno:executes.
:pomexec_2 rr:objectMap :omexec_2.
:omexec_2 rr:constant "http://users.ugent.be/~bjdmeest/function/grel.ttl#string_contains";
    rr:termType rr:IRI.
:fn_2 rr:predicateObjectMap :pom_2.
:pom_2 a rr:PredicateObjectMap;
    rr:predicateMap :pm_2.
:pm_2 a rr:PredicateMap;
    rr:constant grel:valueParameter.
:pom_2 rr:objectMap :om_2.
:om_2 a rr:ObjectMap;
    rml:reference "warc-header.warc-type";
    rr:termType rr:Literal.
:fn_2 rr:predicateObjectMap :pom_3.
:pom_3 a rr:PredicateObjectMap;
    rr:predicateMap :pm_3.
:pm_3 a rr:PredicateMap;
    rr:constant grel:string_sub.
:pom_3 rr:objectMap :om_3.
:om_3 a rr:ObjectMap;
    rr:constant "response";
    rr:termType rr:Literal.
:fn_1 rr:predicateObjectMap :pom_4.
:pom_4 a rr:PredicateObjectMap;
    rr:predicateMap :pm_4.
:pm_4 a rr:PredicateMap;
    rr:constant grel:param_rep_b.
:pom_4 rr:objectMap :om_4.
:om_4 a rr:ObjectMap, fnml:FunctionTermMap;
    rr:datatype xsd:boolean;
    fnml:functionValue :fn_3.
:fn_3 rml:logicalSource :source_0;
    rr:predicateObjectMap :pomexec_3.
:pomexec_3 rr:predicateMap :pmexec_3.
:pmexec_3 rr:constant fno:executes.
:pomexec_3 rr:objectMap :omexec_3.
:omexec_3 rr:constant "http://users.ugent.be/~bjdmeest/function/grel.ttl#string_contains";
    rr:termType rr:IRI.
:fn_3 rr:predicateObjectMap :pom_5.
:pom_5 a rr:PredicateObjectMap;
    rr:predicateMap :pm_5.
:pm_5 a rr:PredicateMap;
    rr:constant grel:valueParameter.
:pom_5 rr:objectMap :om_5.
:om_5 a rr:ObjectMap;
    rml:reference "warc-header.warc-target-uri";
    rr:termType rr:Literal.
:fn_3 rr:predicateObjectMap :pom_6.
:pom_6 a rr:PredicateObjectMap;
    rr:predicateMap :pm_6.
:pm_6 a rr:PredicateMap;
    rr:constant grel:string_sub.
:pom_6 rr:objectMap :om_6.
:om_6 a rr:ObjectMap;
    rr:constant "show";
    rr:termType rr:Literal.
:fn_0 rr:predicateObjectMap :pom_7.
:pom_7 a rr:PredicateObjectMap;
    rr:predicateMap :pm_7.
:pm_7 a rr:PredicateMap;
    rr:constant idlab-fn:str.
:pom_7 rr:objectMap :om_7.
:om_7 a rr:ObjectMap;
    rr:template "http://example.org/ns#myMessage_{status.id_str}";
    rr:termType rr:Literal.
:pom_8 a rr:PredicateObjectMap.
:map_myMapping_0 rr:predicateObjectMap :pom_8.
:pm_8 a rr:PredicateMap.
:pom_8 rr:predicateMap :pm_8.
:pm_8 rr:constant rdf:type.
:pom_8 rr:objectMap :om_8.
:om_8 a rr:ObjectMap;
    rr:constant "http://www.w3.org/ns/prov#Collection";
    rr:termType rr:IRI.
:pom_9 a rr:PredicateObjectMap.
:map_myMapping_0 rr:predicateObjectMap :pom_9.
:pm_9 a rr:PredicateMap.
:pom_9 rr:predicateMap :pm_9.
:pm_9 rr:constant rdfs:label.
:pom_9 rr:objectMap :om_9.
:om_9 a rr:ObjectMap;
    rr:constant "A response containing the string 'show'";
    rr:termType rr:Literal;
    rr:language "en".

test data

failing-message.json

[{
    "id": 1,
    "id_str": "1",
    "name": "test",
    "status": {
        "id": 2,
        "id_str": "2",
    },
    "warc-header": {
        "warc-type": "response",
        "warc-target-uri": "https://api.twitter.com/1.1/users/show.json?user_id=1&tweet_mode=extended"
    }
}]
SvenLieber commented 3 years ago

For completeness: in my particular case I found a workaround.

In case the string which needs to be included is not embedded in variable content, a combination of concatenation and grel:string_contains can be used with a single function call. For example:

s: ex:myMessage_$(status.id_str)
    condition:
      function: grel:string_contains
      parameters:
        - [grel:valueParameter, "$(warc-header.warc-type)_$(warc-header.warc-target-uri)"]
        - [grel:string_sub, "response_https://api.twitter.com/1.1/statuses/show.json"]

In this case the string would be something like this: https://api.twitter.com/1.1/statuses/show.json?user_id=1, where the variable parts (user_id) is after the static part and thus the concatenation works. Similarly this workaround would work for something like user_id=1,static-part-ending-in-show with a reversed concatenation, but not for user_id=1,show,other_id=2 with variable content "on both sides".

psiotwo commented 1 year ago

Are there any news on this issue? I have a similar one trying to chain trueCondition, array_join, grel:boolean_and and grel:boolean_not

rr:predicateObjectMap [ rr:predicate fno:executes; rr:objectMap [ rr:constant idlab:trueCondition ]; ];
rr:predicateObjectMap [ rr:predicate idlab:str; rr:objectMap [ fnml:functionValue [ rr:predicateObjectMap [ rr:predicate fno:executes; rr:objectMap [ rr:constant grel:array_join ]; ];
rr:predicateObjectMap [ rr:predicate grel:p_array_a; rr:objectMap [ rr:constant "mailto:"  ]; ];
rr:predicateObjectMap [ rr:predicate grel:p_array_a; rr:objectMap [ rml:reference "address" ]; ]; ]; ]; ];
rr:predicateObjectMap [ rr:predicate idlab:strBoolean; rr:objectMap [ fnml:functionValue [ rr:predicateObjectMap [ rr:predicate fno:executes; rr:objectMap [ rr:constant grel:boolean_and ]; ];
rr:predicateObjectMap [ rr:predicateMap [ rr:constant grel:param_rep_b ] ;
rr:objectMap [ fnml:functionValue [ rr:predicateObjectMap [ rr:predicate fno:executes; rr:objectMap [ rr:constant grel:boolean_not ]; ];
rr:predicateObjectMap [ rr:predicate grel:bool_b; rr:objectMap [ rr:constant true ];];];];];