RMLio / rml-implementation-report

Implementation report for RML tools
MIT License
3 stars 5 forks source link

multiple object values? #11

Closed VladimirAlexiev closed 5 years ago

VladimirAlexiev commented 5 years ago

The ability to produce multiple values is a crucial function for any source that supports multi-value "fields" (eg JSON and XML). CARML has a multiReference extension for this. But I've argued that no syntax extension is needed: it's enough to spec this behavior in RML (https://github.com/carml/carml/issues/52).

@andimou, @pheyvaer: What do you think about this? Is a syntactic extension needed to produce multiple object values or not? If not, does the test suite include some multi-value mappings?

Cheers!

andimou commented 5 years ago

Most probably we didn't clarify well in the specification.

The RML specification does not restrict the number of values that are returned for a certain rml:reference or a reference included in rr:template.

This comes as a consequence of including other formats. Namely, R2RML works with tables in relational databases. Thus, it is normal to expect that one value would be returned for a certain reference (rr:column).

However, RML works with other formats, including eg XML. There, an XPath expression might be validated over multiple elements and, thus, multiple values are returned. RML needs to support these cases by generating multiple RDF terms, one for each returned value.

VladimirAlexiev commented 5 years ago

Then @pmaria I think it's best if CARML removes the carml:multiReference extension, because it's not needed.

@andimou

@pheyvaer Are there test cases for a tree of multi-valued fields?

pmaria commented 5 years ago

Hi @VladimirAlexiev, @andimou,

@VladimirAlexiev thanks for igniting this discussion. I've been meaning to for a while, but haven't taken the time...

I agree that the described multiplicity is natural for RML. I do however think some work needs to be done to the spec and the RML vocabulary to clarify this aspect. I'm willing to help with that if necessary. For example, does this apply to all term maps? I also recently had a short discussion with @dachafra about how this affects joins: carml/carml#78. haven't had time to look into that further either, but it would be nice to clarify that as well. It would be good to add some test cases to cover this aspect too.

In the mean time I'm hoping to have some time next month to look into deprecating the extension constructs in CARML. And from the implementation report I see I have some more stuff to fix. :)

andimou commented 5 years ago

I raised the issue to the community group, see https://www.w3.org/community/kg-construct/track/issues/3. Perhaps it can be discussed there!

andimou commented 5 years ago

@andimou

  • And if there are nested multi-value fields, this can naturally lead to a whole tree of triples generated from a couple of mappings?

@VladimirAlexiev what exactly do you mean by tree? There might be alternative values returned, then I would expect to have one RDF term being generated for each returned value

  • Do the other 3 RML implementations support such behavior?

Can't say for the other implementations! I guess we would be able to say if there is a relevant test case

  • @pmaria, do you think some clarification is needed in the RML spec?

@pheyvaer Are there test cases for a tree of multi-valued fields?

The current test cases support what R2RML spec defines. But it makes sense to have a test case for multiple values

andimou commented 5 years ago

Apparently it was clarified in this publication https://ieeexplore.ieee.org/document/6882016 but I think we should agree on a certain interpretation (perhaps we need to reconsider?) and update the spec with this.

I quote here what it was mentioned then:

Generating a subject but multiple objects on an iteration: While in R2RML a certain column name occurs only once, in the case of RML, an expression specified at a Term Map could be satisfied more times. As generating a single subject per iteration is a fundamental assumption of R2RML’s definition, RML keeps the restriction that any reference used in the Subject Map definition should occur only once in the extract of data returned from a certain iteration. However, RML does not put any restrictions when the reference included in a Predicate or Object Map is satisfied more than once and, thus multiple predicates or objects are generated

VladimirAlexiev commented 5 years ago

@andimou by "tree" I mean that if the multiple value (object) X has further data nested under it, that should cause multiple triples with X as subject.

For example, a JSON structure like this, with multiple values for author and affiliation:

article
  author
    affiliation

could cause triples like this:

<article1> :authorship <author1>, <author2>.
<author1> :name "author1"; :affiliation "affiliation11", "affiliation12".
<author2> :name "author2"; :affiliation "affiliation21", "affiliation22", "affiliation23".