RMLio / RML-Mapper

Generate High Quality Linked Data from multiple originally (semi-)structured data (legacy)
http://RML.io
52 stars 20 forks source link

normalization in RML #12

Closed m1ci closed 6 years ago

m1ci commented 7 years ago

Hi,

is it possible to perform normalization in RML? For example, when contracting URIs we want to normalize the names.

Example: We want smth like: http://linked-web-apis.fit.cvut.cz/resource/supportbee-api but we get http://linked-web-apis.fit.cvut.cz/resource/SupportBee%20API_api

thanks, Milan

seralf commented 7 years ago

Hi

in my team we have a similar (and recurring) use case. I think the best way to accomplish a customized normalization is to add the ability to plug a custom function for manipulating contents.

With the same contract one could also generate other kind of "computed" values (for example sums, tokenized values, or rowid, as I suggested here: https://github.com/RMLio/RML-DataRetrieval/issues/5)

bjdmeest commented 7 years ago

Current work involves aligning RML with FnO and extending the processor to allow custom data processing instructions (see https://github.com/RMLio/RML-Mapper/tree/extension-fno).

It uses an extension to connect with functions, see the example below:

<#Person_Mapping>
  rml:logicalSource <#LogicalSource> ; # Specify the data source
  rr:subjectMap <#SubjectMap> ; # Specify the subject
  rr:predicateObjectMap <#NameMapping> . # Specify the predicate-object-map

<#NameMapping>
  rr:predicate dbo:title ; # Specify the predicate
  rr:objectMap <#FunctionMap> . # Specify the object-map

<#FunctionMap>
  fnml:functionValue [ # The object is the result of the function
    rml:logicalSource <#LogicalSource> ; # Use the same data source for input
    rr:predicateObjectMap [
      rr:predicate fno:executes ; # Execute `grel:titleCase`
      rr:objectMap [ rr:constant grel:titleCase ] ] ;
    rr:predicateObjectMap [
      rr:predicate grel:inputString ;
      rr:objectMap [ rr:reference "name" ] ] # Use as input the "name" reference
  ] .

In its current state, additional processing instruction can be added by including .java or .jar files in resources/functions, and update their metadata in resources/functions/metadata.json. The folder should be placed relative to the current working directory (see examples in the root of this project, branch extension-fno). Some implementations are available at https://github.com/fnoio

This work will be explained more in detail at ESWC2017: https://ruben.verborgh.org/publications/demeester_eswc_2017/

andimou commented 7 years ago

To add on the above comment, RML processor and in particular the core module is meant to be generic. A certain data value is provided and used to generate an RDF term. How this data value is extracted and if this data value is somehow processed is not the concern of the mapping function.

RML and the RML Processor target a modular approach which allows data to be retrieved from different interfaces which might be different for different data providers. Data provider is responsible to describe their data access interfaces.

Similarly, desired data transformations differ among different data owners. Therefore, it's meaningful that each data owner "plugs" its own data transformations or reuses existing ones if they fit its needs. In this context, the core RML processor would not normalize the values, as the R2RML does not do so too.

pheyvaer commented 6 years ago

Hi,

We've updated the RMLMapper and it can now be found here. I think that the last reply by @bjdmeest solved the issue. If not, you can open a new issue at the repo of the new RMLMapper.