RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

Processing Multivalue References in CSV #173

Open tobiasschweizer opened 2 years ago

tobiasschweizer commented 2 years ago

Hi there

I am trying to create linking property values from a string with concatenated foreign keys in a CSV.

CSV data source: https://data.snf.ch/Exportcsv/Person.csv

mapping:

@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix rml: <http://semweb.mmlab.be/ns/rml#>.
@prefix ql: <http://semweb.mmlab.be/ns/ql#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix schema: <http://schema.org/>.
@prefix wgs84_pos: <http://www.w3.org/2003/01/geo/wgs84_pos#lat>.
@prefix gn: <http://www.geonames.org/ontology#>.
@prefix carml: <http://carml.taxonic.com/carml/> .
@prefix fnml: <http://semweb.mmlab.be/ns/fnml#> .
@prefix grel: <http://users.ugent.be/~bjdmeest/function/grel.ttl#> .
@prefix fno: <https://w3id.org/function/ontology#> .
@base <http://example.com/ns#>.

<#LogicalSourcePerson> a rml:BaseSource ;
  rml:source <#CSVW_sourcePerson> ;
  rml:referenceFormulation ql:CSV .

<#CSVW_sourcePerson> a csvw:Table;
   csvw:url "Person.csv" ;
   csvw:dialect [ a csvw:Dialect;
       csvw:delimiter ";"
   ] .

<#PersonMapping> a rr:TriplesMap;
  rml:logicalSource <#LogicalSourcePerson> ;

  rr:subjectMap [
    rr:template "http://snf.ch/person/{PersonNumber}";
    rr:class schema:Person
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:memberOf ;
    rr:objectMap <#JoinMap> ;
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:givenName ;
    rr:objectMap [
      rml:reference "FirstName"
    ]
  ] ;

  rr:predicateObjectMap [
    rr:predicate schema:familyName ;
    rr:objectMap [
      rml:reference "Surname"
    ]
  ] .

<#JoinMap>
    fnml:functionValue [
        rr:predicateObjectMap [
            rr:predicate fno:executes ;
            rr:objectMap [ rr:constant grel:array_join ]
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_array_a ;
            rr:objectMap [ rr:constant "http://snf.ch/project/" ]
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_array_a ;
            rr:objectMap <#FunctionMap>
        ];
    ] .

# https://stackoverflow.com/questions/53715353/converting-a-csv-to-rdf-where-one-column-is-a-set-of-values
<#FunctionMap>
    fnml:functionValue [
        rml:logicalSource <#LogicalSourceGrant>;
        rr:predicateObjectMap [
            rr:predicate fno:executes;
            rr:objectMap [
                rr:constant grel:string_split # function to use
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:valueParameter;
            rr:objectMap [
                rml:reference "ResponsibleApplicantGrantNumber" # input string: concatenated foreign keys
            ];
        ];
        rr:predicateObjectMap [
            rr:predicate grel:p_string_sep;
            rr:objectMap [
                rr:constant ";";
            ];
        ];
    ].

result:

"http://schema.org/memberOf" : [ {
    "@value" : "http://snf.ch/project/111925667315583634468"
  }

expected result:

  "http://schema.org/memberOf" : [ {
    "@value" : "http://snf.ch/project/111925"
  }, {
    "@value" : "http://snf.ch/project/34468"
  }, {
    "@value" : "http://snf.ch/project/55836"
  }, {
    "@value" : "http://snf.ch/project/66731"
  } ]

For more details, see https://github.com/kg-construct/rml-questions/discussions/15#discussioncomment-2991734

DylanVanAssche commented 2 years ago

The problem is that the same delimiter is used for multi values as columns. To avoid confusing, the values are quoted. However, the OpenCSV library in the RMLMapper does not pick this up it seems.

tobiasschweizer commented 2 years ago

The problem is that the same delimiter is used for multi values as columns. To avoid confusing, the values are quoted. However, the OpenCSV library in the RMLMapper does not pick this up it seems.

Yes, this is how I understood this works in CSV. The quoting ist like escaping characters that have a special meaning (meta chars). Maybe we could look at the library you mentioned or create an issue in their repo. Let my know if I can be of any assistance.