RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
148 stars 61 forks source link

Generation of RDF object for a JSON property which is a reference to a single JSON Node #72

Closed CyberDaedalus00 closed 3 years ago

CyberDaedalus00 commented 4 years ago

I've been trying to map a simple JSON document that contains the following:

{
    "type": "bundle",
    "id": "bundle--0554c315-1893-467b-9362-fe0d1c336cdf",
    "objects": [
        {
            "type": "file", 
            "spec_version": "2.1", 
            "id": "file--e277603e-1060-5ad4-9937-c26c97f1ca68", 
            "hashes": { 
                "SHA-256": "c1112d5167e972c0aea47b762b7f89d0f8df9bd11a611198073bd35b53b98173",
                "MD5": "743207f2a2572a0129c5ddd6fe5eb4bb"
            }, 
            "size": 25536, 
            "name": "foo.dll"
        }
    ]
}

Using the RML mapping

@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix fnml:   <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno:    <https://w3id.org/function/ontology#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .
@prefix grel:    <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

@prefix stix: <http://docs.oasis-open.org/cti/ns/stix#> .
@prefix stixCore: <http://docs.oasis-open/cti/ns/stix/core#> .

@prefix file: <http://docs.oasis-open.org/cti/ns/stix/file#> .

@base <http://docs.oasis-open.org/cti/ns/stix/> .

#
# Define SCO specific mapping
#
<http://docs.oasis-open.org/cti/ns/map#FileMapping>
    a rr:TriplesMap ;

    # rml:logicalSource <http://docs.oasis-open.org/cti/ns#LogicalSource> ;
    rml:logicalSource [
        rml:source "stix-document.json" ;
        rml:referenceFormulation ql:JSONPath ;
        rml:iterator "$.objects[?(@.type=='file')]" ;
     ] ;

    rr:subjectMap [
        rr:template "http://cti.oasis-open.org/{id}" ;
        rr:class stix:File
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:type ;
        rr:objectMap [ rml:reference "type" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:spec_version ;
        rr:objectMap [ rr:constant "2.1" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:id ;
        rr:objectMap [ rml:reference "id" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:defanged ;
        rr:objectMap [ 
            rml:reference "defanged" ;
            rr:datatype xsd:boolean ;
        ]
    ] ;

    #
    # Create SCO-specific predicates from this point forward
    #
    rr:predicateObjectMap [
        rr:predicate stixCore:hashes ;
        rr:objectMap [ 
            rml:reference "hashes"; 
            rr:parentTriplesMap <http://docs.oasis-open.org/cti/ns/map#HashesMapping> ;
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:name ;
        rr:objectMap [ rml:reference "name" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:size ;
        rr:objectMap [ rml:reference "size" ; rr:datatype xsd:nonNegativeInteger ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:name_enc ;
        rr:objectMap [ rml:reference "name_enc" ]
    ] ;
    rr:predicateObjectMap [
        rr:predicate file:magic_number_hex ;
        rr:objectMap [ rml:reference "magic_number_hex" ; rr:datatype xsd:hexBinary ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:mime_type ;
        rr:objectMap [ rml:reference "mime_type" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:ctime ;
        rr:objectMap [ rml:reference "ctime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:time ;
        rr:objectMap [ rml:reference "mtime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:atime ;
        rr:objectMap [ rml:reference "atime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:parent_directory_ref ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{parent_directory_ref}" ;
            rr:termType rr:IRI
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:contains_refs ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{contains_refs}" ;
            rr:termType rr:IRI
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:content_ref ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{content_ref}" ;
            rr:termType rr:IRI
        ]
    ] ;
.

<http://docs.oasis-open.org/cti/ns/map#HashesMapping>
    a rr:TriplesMap ;

    # rml:logicalSource <http://docs.oasis-open.org/cti/ns#LogicalSource> ;
    rml:logicalSource [
        rml:source "stix-document.json" ;
        rml:referenceFormulation ql:JSONPath ;
        rml:iterator "$.objects[*].hashes" ;
    ] ;

    rr:subjectMap [
        fnml:functionValue [
            rr:predicateObjectMap [
                rr:predicate fno:executes ;
                rr:objectMap [ rr:constant grel:array_join ] ;
            ] ;

            rr:predicateObjectMap [
                rr:predicate grel:p_array_a ;
                rr:objectMap [ rr:constant "hashes--"] ;
            ] ;

            rr:predicateObjectMap [
                rr:predicate grel:p_array_a ;
                rr:objectMap [
                    fnml:functionValue [
                        rr:predicateObjectMap [
                            rr:predicate fno:executes ;
                            rr:objectMap [ rr:constant idlab-fn:random ] ;
                        ] ;
                    ] ;
                ] ;
            ];
        ] ;
        rr:class stixCore:Hashes ;
        rr:termType rr:BlankNode ;
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:MD5_hash_value ;
        rr:objectMap [ rml:reference "MD5" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SHA-1_hash_value ;
        rr:objectMap [ rml:reference "SHA-1" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SHA-256_hash_value ;
        rr:objectMap [ rml:reference "SHA-256" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SHA-512_hash_value ;
        rr:objectMap [ rml:reference "SHA-512" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SHA3-256_hash_value ;
        rr:objectMap [ rml:reference "SHA3-256" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SHA3-512_hash_value ;
        rr:objectMap [ rml:reference "SHA3-512" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:SSDEEP_hash_value ;
        rr:objectMap [ rml:reference "SSDEEP" ]
    ] ;

    rml:predicateObjectMap [
        rr:predicate stixCore:TLSH_hash_value ;
        rr:objectMap [ rml:reference "TLSHS" ]
    ] ;
.

But when it encounters the hashes property, it generates the IRI for the subject that represents the Hashes node but not any of the contents, as shown in the generated nquads below.

<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://docs.oasis-open.org/cti/ns/stix#File>.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#hashes> <hashes--b7b49698-e602-4b7a-a93c-6173d93de472>.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#name> "foo.dll".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open.org/cti/ns/stix/file#size> "25536"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#type> "file".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#spec_version> "2.1".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#id> "file--e277603e-1060-5ad4-9937-c26c97f1ca68".
<http://docs.oasis-open.org/cti/ns/stix/hashes--b7b49698-e602-4b7a-a93c-6173d93de472> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://docs.oasis-open/cti/ns/stix/core#Hashes>.

I looked through all the test cases in the source tree, but couldn't find anything that replicates what is done with the hashes property. I also looked through the list of issues and I wasn't sure if this was related to issue #57, which I see has been closed pending an update to the JSONpath engine used.

any idea what is going on?

DylanVanAssche commented 3 years ago

It seems that you have a typo in your RML rules :) The following does not exist in the RML namespace:

rml:predicateObjectMap

Since RML is compatible with R2RML, it reuses the PredicateObjectMap from R2RML:

rr:predicateObjectMap

I changed your mapping rules to:

@prefix ql: <http://semweb.mmlab.be/ns/ql#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rr: <http://www.w3.org/ns/r2rml#> .
@prefix rml: <http://semweb.mmlab.be/ns/rml#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix fnml:   <http://semweb.mmlab.be/ns/fnml#> .
@prefix fno:    <https://w3id.org/function/ontology#> .
@prefix idlab-fn: <http://example.com/idlab/function/> .
@prefix grel:    <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .

@prefix stix: <http://docs.oasis-open.org/cti/ns/stix#> .
@prefix stixCore: <http://docs.oasis-open/cti/ns/stix/core#> .

@prefix file: <http://docs.oasis-open.org/cti/ns/stix/file#> .

@base <http://docs.oasis-open.org/cti/ns/stix/> .

#
# Define SCO specific mapping
#
<http://docs.oasis-open.org/cti/ns/map#FileMapping>
    a rr:TriplesMap ;

    # rml:logicalSource <http://docs.oasis-open.org/cti/ns#LogicalSource> ;
    rml:logicalSource [
        rml:source "stix-document.json" ;
        rml:referenceFormulation ql:JSONPath ;
        rml:iterator "$.objects[?(@.type=='file')]" ;
     ] ;

    rr:subjectMap [
        rr:template "http://cti.oasis-open.org/{id}" ;
        rr:class stix:File
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:type ;
        rr:objectMap [ rml:reference "type" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:spec_version ;
        rr:objectMap [ rr:constant "2.1" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:id ;
        rr:objectMap [ rml:reference "id" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:defanged ;
        rr:objectMap [ 
            rml:reference "defanged" ;
            rr:datatype xsd:boolean ;
        ]
    ] ;

    #
    # Create SCO-specific predicates from this point forward
    #
    rr:predicateObjectMap [
        rr:predicate stixCore:hashes ;
        rr:objectMap [ 
            #rml:reference "hashes"; 
            rr:parentTriplesMap <http://docs.oasis-open.org/cti/ns/map#HashesMapping> ;
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:name ;
        rr:objectMap [ rml:reference "name" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:size ;
        rr:objectMap [ rml:reference "size" ; rr:datatype xsd:nonNegativeInteger ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:name_enc ;
        rr:objectMap [ rml:reference "name_enc" ]
    ] ;
    rr:predicateObjectMap [
        rr:predicate file:magic_number_hex ;
        rr:objectMap [ rml:reference "magic_number_hex" ; rr:datatype xsd:hexBinary ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:mime_type ;
        rr:objectMap [ rml:reference "mime_type" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:ctime ;
        rr:objectMap [ rml:reference "ctime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:time ;
        rr:objectMap [ rml:reference "mtime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:atime ;
        rr:objectMap [ rml:reference "atime" ; rr:datatype xsd:dateTime ] ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:parent_directory_ref ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{parent_directory_ref}" ;
            rr:termType rr:IRI
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:contains_refs ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{contains_refs}" ;
            rr:termType rr:IRI
        ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate file:content_ref ;
        rr:objectMap [
            rr:template "http://cti.oasis-open.org/{content_ref}" ;
            rr:termType rr:IRI
        ]
    ] ;
.

<http://docs.oasis-open.org/cti/ns/map#HashesMapping>
    a rr:TriplesMap ;

    # rml:logicalSource <http://docs.oasis-open.org/cti/ns#LogicalSource> ;
    rml:logicalSource [
        rml:source "stix-document.json" ;
        rml:referenceFormulation ql:JSONPath ;
        rml:iterator "$.objects[*].hashes" ;
    ] ;

    rr:subjectMap [
        fnml:functionValue [
            rr:predicateObjectMap [
                rr:predicate fno:executes ;
                rr:objectMap [ rr:constant grel:array_join ] ;
            ] ;

            rr:predicateObjectMap [
                rr:predicate grel:p_array_a ;
                rr:objectMap [ rr:constant "hashes--"] ;
            ] ;

            rr:predicateObjectMap [
                rr:predicate grel:p_array_a ;
                rr:objectMap [
                    fnml:functionValue [
                        rr:predicateObjectMap [
                            rr:predicate fno:executes ;
                            rr:objectMap [ rr:constant idlab-fn:random ] ;
                        ] ;
                    ] ;
                ] ;
            ];
        ] ;
        rr:class stixCore:Hashes ;
        rr:termType rr:BlankNode ;
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:MD5_hash_value ;
        rr:objectMap [ rml:reference "MD5" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SHA-1_hash_value ;
        rr:objectMap [ rml:reference "SHA-1" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SHA-256_hash_value ;
        rr:objectMap [ rml:reference "SHA-256" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SHA-512_hash_value ;
        rr:objectMap [ rml:reference "SHA-512" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SHA3-256_hash_value ;
        rr:objectMap [ rml:reference "SHA3-256" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SHA3-512_hash_value ;
        rr:objectMap [ rml:reference "SHA3-512" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:SSDEEP_hash_value ;
        rr:objectMap [ rml:reference "SSDEEP" ]
    ] ;

    rr:predicateObjectMap [
        rr:predicate stixCore:TLSH_hash_value ;
        rr:objectMap [ rml:reference "TLSHS" ]
    ] ;
.

And reran the RML Mapper which gives the desired results:

<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://docs.oasis-open.org/cti/ns/stix#File>.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#hashes> _:hashes--98643a77-9582-4162-8b19-442545bac863.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#name> "foo.dll".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open.org/cti/ns/stix/file#size> "25536"^^<http://www.w3.org/2001/XMLSchema#nonNegativeInteger>.
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#type> "file".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#spec_version> "2.1".
<http://cti.oasis-open.org/file--e277603e-1060-5ad4-9937-c26c97f1ca68> <http://docs.oasis-open/cti/ns/stix/core#id> "file--e277603e-1060-5ad4-9937-c26c97f1ca68".
_:hashes--98643a77-9582-4162-8b19-442545bac863 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://docs.oasis-open/cti/ns/stix/core#Hashes>.
_:hashes--98643a77-9582-4162-8b19-442545bac863 <http://docs.oasis-open/cti/ns/stix/core#MD5_hash_value> "743207f2a2572a0129c5ddd6fe5eb4bb".
_:hashes--98643a77-9582-4162-8b19-442545bac863 <http://docs.oasis-open/cti/ns/stix/core#SHA-256_hash_value> "c1112d5167e972c0aea47b762b7f89d0f8df9bd11a611198073bd35b53b98173".