Providing "Virtual" Mappings

tobiasschweizer commented 2 years ago

I noticed that CARML supports streams like stdin which is very useful because no fixed file name ends up in the mapping.

Will there also be the possibility to provide additional "virtual" mappings for handling file names? From the docs I can see that it is possible to provide more than one mapping.

The main mapping would contain one or several logical sources without an rml:source and an additional mapping would provide the rml:source, see https://github.com/RMLio/rmlmapper-java/issues/97#issuecomment-781224985.

Would it also also possible to provide mappings inline, .e.g. -m "<#myLogicalSource> rml:source \"file.json\" ."?

pmaria commented 2 years ago

I noticed that CARML supports streams like stdin which is very useful because no fixed file name ends up in the mapping.

Will there also be the possibility to provide additional "virtual" mappings for handling file names? From the docs I can see that it is possible to provide more than one mapping.

The main mapping would contain one or several logical sources without an rml:source and an additional mapping would provide the rml:source, see RMLio/rmlmapper-java#97 (comment).

Yes, this is possible. When multiple mapping files are provided, they are first combined into one Model and then mapped to mapping objects internally. So you could organize your files how you wish.

Would it also also possible to provide mappings inline, .e.g. -m "<#myLogicalSource> rml:source \"file.json\" ."?

We could make something like that possible sure. The above way is a bit tricky though. For instance, which RDF format do you support for specifying the extra triples? Only n-quads?

An other approach I was thinking of recently to achieve something similar would be to plug in a template engine with which to template the mapping files, and then providing the template mapping via a cli option.

tobiasschweizer commented 2 years ago

For instance, which RDF format do you support for specifying the extra triples? Only n-quads?

Good question. For now, I have written my mappings in Turtle.

An other approach I was thinking of recently to achieve something similar would be to plug in a template engine with which to template the mapping files, and then providing the template mapping via a cli option.

I was actually thinking about using a template engine to generate the mapping files. So far, I have two mappings for two target types (schema:Book and schema:ScholarlyArticle) and I would like to reduce the two mappings to one template to avoid redundancy. Also templating might make the use of some RML functions unnecessary, e.g., depending on a property's value, a target type is chosen.

tobiasschweizer commented 2 years ago

Hi there,

I am finally coming back to this :-)

I've tried keeping things in two different mapping files which works fine:

The "core" mapping instructions with references to logical sources
The definition of the logical sources' actual rml:source, e.g. a file or the CARML stdin.

I would like to further evaluate the two options discussed above:

Providing an inline mapping from the command line
Plug in a template engine

I'd would like to explore option one for now as it seems quite straight forward. I'll have to think about the point you raised considering the RDF serialisation format.

I suppose I would need some guidance at some point. Would that be ok for you?

pmaria commented 2 years ago

@tobiasschweizer Ok great. Thinking about this a bit more, I'm leaning more towards option 2 as a preference. Because:

if you have multiple places where you want to use the same value in your mapping, you could reuse a template variable.
it is does not bind you to a specific syntax. For example, if we were to support YARRRML mappings, or any other non-RDF syntax, it could work with the same interface.

The downside of course is that this is not standardized in any way.

tobiasschweizer commented 2 years ago

@tobiasschweizer Ok great. Thinking about this a bit more, I'm leaning more towards option 2 as a preference. Because:

if you have multiple places where you want to use the same value in your mapping, you could reuse a template variable.

it is does not bind you to a specific syntax. For example, if we were to support YARRRML mappings, or any other non-RDF syntax, it could work with the same interface.

The downside of course is that this is not standardized in any way.

Yes, I see your point. Templates would be extremely helpful. However, just to get some grip on CARML Jar I'd like to try to figure out what I can do myself for option one.

I am on vacation next week but I'll be back the week after. Let me know if I can help with anything regarding the templates. Do you already have an engine in mind? I used to work with Twirl (Scala) for SPARQL queries. This worked quite well.

In any case, using a template engine could be thought of as a single, separate step before using the RML engine. So if this could be cleanly abstracted out maybe also other RML engines could be pick up?

pmaria commented 2 years ago

Yes, I see your point. Templates would be extremely helpful. However, just to get some grip on CARML Jar I'd like to try to figure out what I can do myself for option one.

Cool!

I am on vacation next week but I'll be back the week after. Let me know if I can help with anything regarding the templates. Do you already have an engine in mind? I used to work with Twirl (Scala) for SPARQL queries. This worked quite well.

OK. I've used Pebble Templates in a couple of cases, and it works quite nicely, and is pretty customizable, yet simple to implement.

In any case, using a template engine could be thought of as a single, separate step before using the RML engine. So if this could be cleanly abstracted out maybe also other RML engines could be pick up?

Hmm that's itneresting, but possibly tricky. Would have to think about how best to do this.

In any case I would want the templating to be a separate module, also keeping the relevant option specification separate from the core stuff,

tobiasschweizer commented 1 year ago

I am experimenting with a template engine in Python (jinja2) to generate several mappings for different providers from the same source. The mappings differ in terms of IRIs and it seems quite easy to do this with a template engine. I tried conditional subject maps in RML but only in YAML https://github.com/kg-construct/rml-questions/discussions/17 but I cannot recommend this approach.

Also template engines make it easy to keep consistent when avoiding logical joins since the same IRI has to be generated several times.

I would be happy to share my insights once I have finished the first iteration.

carml / carml-jar

Providing "Virtual" Mappings #10