RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
146 stars 61 forks source link

One mapping file for multiple input source files #161

Closed tjroamer closed 2 years ago

tjroamer commented 2 years ago

I have thousands of XML files which have same structure, i.e. one RML file is enough to process all of them. To use RML, I need to hard-code XML file names in the RML mapping file. I am wondering whether it is possible to generalize the mapping file to let it work for multiple XML files in a configurable way. Currently, I wrote a wrapper script that copies these XML files to a temp name, which is hard-coded in the RML file as rml:source. But it is quite ugly and not scalable.

Any suggestions would be much appreciated. Thanks.

DylanVanAssche commented 2 years ago

Hi @tjroamer !

RML mapping rules are RDF so you can edit the RDF edit programmatically as well. If you read the RDF with RDF libraries such as RDFLib in Python, RDF/JS in Node, etc. you can programmatically generate RML mapping rules and pass them to the RMLMapper.

If you have any questions, feel free to reach out!

tjroamer commented 2 years ago

Thanks for your quick reply! Yes, the way you suggested would work as well. For my use case, it would mean that I need to generate a lot of temp RML files that contain just different XML file names.

Wouldn't it be a nice feature in the RMLMapper to let the user pass a XML file name as a variable? I see the YARRRML supports external variable already, but it does not work for source file name unfortunately.

bjdmeest commented 2 years ago

RMLMapper-java supports multiple mapping files/strings as its CLI, which it concatenates. Another option could be to keep the source description out of the mapping file, and add that triple when invoking the CLI, something like

java -jar rmlmapper-java.jar -m "./mapping-no-source-description.rml.ttl" -m "<http://mapping.ex.com/myLogicalSource> <http://semweb.mmlab.be/ns/rml#source> \"/path/to/one/of/the/thousand/xml/files.xml\" ."
tjroamer commented 2 years ago

@bjdmeest that's really cool! This is what I needed. Many thanks for your help!

uxapj commented 9 months ago

RMLMapper-java supports multiple mapping files/strings as its CLI, which it concatenates. Another option could be to keep the source description out of the mapping file, and add that triple when invoking the CLI, something like

java -jar rmlmapper-java.jar -m "./mapping-no-source-description.rml.ttl" -m "<http://mapping.ex.com/myLogicalSource> <http://semweb.mmlab.be/ns/rml#source> \"/path/to/one/of/the/thousand/xml/files.xml\" ."

I have tried this at length and I constantly get: Exception in thread "main" java.lang.Error: The Logical Source does not have a source.

Of course I do not use http://mapping.ex.com/myLogicalSource as my RML looks as such: @prefix : http://example.org/rules/ . @prefix rml: http://semweb.mmlab.be/ns/rml# . @prefix rr: http://www.w3.org/ns/r2rml# .

:TriplesMap a rr:TriplesMap; rml:logicalSource [ rml:referenceFormulation ql:JSONPath; rml:iterator "$" ].

But no matter what I do this does not seem to work, are you sure this is still supported?

bjdmeest commented 9 months ago

It would be better to make the logical source in your original mapping file the named node http://mapping.ex.com/myLogicalSource instead of a blank node. Otherwise there's no way the engine can link the two mapping files together. See below.

@Prefix : http://example.org/rules/ .
@Prefix rml: http://semweb.mmlab.be/ns/rml# .
@Prefix rr: http://www.w3.org/ns/r2rml# .

:TriplesMap a rr:TriplesMap;
rml:logicalSource <http://mapping.ex.com/myLogicalSource> .

<http://mapping.ex.com/myLogicalSource> rml:referenceFormulation ql:JSONPath;
    rml:iterator "$" .
uxapj commented 9 months ago

Ah you are so right, thank you so much for your help! Now this works as expected!!