SEMICeu / LinkedDataEventStreams

The Linked Data Event Streams specification
https://tree.linkeddatafragments.org/linked-data-event-streams/
23 stars 9 forks source link

Add and specify support for named graphs #37

Open pietercolpaert opened 1 year ago

pietercolpaert commented 1 year ago

There is currently a proposal to support named graph serializations like TriG and JSON-LD.

The proposal would be to allow something like this:

ex:C1 a ldes:EventStream ;
      tree:member ex:Graph1, ex:Graph2 .

ex:Graph1 prov:generatedAtTime "2022-12-25T12:00" ;
                    foaf:primaryTopic ex:Streetname1 .

ex:Graph1 {
   ex:Streetname1 rdfs:label "Station Street" .
}

ex:Graph2 prov:generatedAtTime "2023-12-25T12:00" ;
                    foaf:primaryTopic ex:Streetname1 .

ex:Graph2 {
   ex:Streetname1 rdfs:label "Station Square" .
}
pietercolpaert commented 1 year ago

As this is an addition to the TREE vocabulary, we might want to discuss this in the TREE hypermedia community group instead.

pietercolpaert commented 1 year ago

I currently still see two options here:

Option 1: The LDES can only work with named graphs

As in the above example.

This entails the fact that we must always interpret the named graph through the pipeline. This member however does only need to be processed once.

Option 2: The LDES view works with named graphs only to make the member extraction algorithm more efficient

Same idea, but now it’s the view on top of an LDES that has version-object that is just a compatibility layer towards quad-based serializations. The same LDES can be published in RDF1.0.

Example:

ex:C1 a ldes:EventStream ;
      tree:view <currentpage> ;
      tree:member ex:M1v1, ex:M1v2 .

<currentpage> tree:namedGraphMembers true . # The property being used on a view instead of on top of the LDES → a client may ignore the named graphs

ex:M1v1 prov:generatedAtTime "2022-12-25T12:00"  .

ex:M1v1 {
   ex:M1v1 rdfs:label "Station Street" ;
                   dcterms:isVersionOf   ex:M1 .
}

ex:M1v2 prov:generatedAtTime "2023-12-25T12:00" .

ex:M1v2 {
   ex:M1v2 rdfs:label "Station Square" ;
                   dcterms:isVersionOf ex:M1 .
}

Side-effect of solution 2 is that a member graph can also contain 2 disjoint graphs when using named graphs.

sandervd commented 1 year ago

I'm advocating for a one subject - one member model. Consistent with this idea, an alternative serialization could be:

ex:C1 a ldes:EventStream ;
      tree:view <currentpage> ;
      tree:member ex:M1v1, ex:M1v2 .

<currentpage> tree:namedGraphMembers true . # The property being used on a view instead of on top of the LDES → a client may ignore the named graphs

ex:M1v1 prov:generatedAtTime "2022-12-25T12:00" ;
                   tree:inGraph <someNamedGraph> ;
                   rdfs:label "Station Street" ;
                   dcterms:isVersionOf   ex:M1 .

ex:M1v2 prov:generatedAtTime "2023-12-25T12:00" ;
                   tree:inGraph <someNamedGraph> ;
                   rdfs:label "Station Square" ;
                   dcterms:isVersionOf ex:M1 .

In this model, it would be best to keep the cardinality of inGraph limited to 1: A new version object would be needed for every upsert/delete of an object into a given graph, as it would be unclear what the semantics would be otherwise for updates.

pietercolpaert commented 1 year ago

@sandervd Your proposal wouldn’t work for disjoint member triples (something that becomes possible when using named graphs, but not in the current LDES design).

The user story of using LDES with named graphs is this one:

As a data broker maintainer I have a streaming pipeline where I rely on RDF1.1 to scope my messages using named graphs. Now I want federate to other brokers and publish my streams as LDESes.

Problem is that in this design, the RDF1.1 updates my contain disjoint graph patterns. Example:

Quad view:

<Member1> {
  <A> <B> <C> .
  <D> <E> <F> .
}

<Member2> {
  <D> <G> <H> .
  <I> <J> <K> .
}

If we introduce named graphs, we must therefore always suppose that the members MAY contain disjoint BGPs, and that this is something we simply cannot represent in triples.

We therefore MUST support a property on an ldes:EventStream entity saying that the whole LDES can only work in quad-based serializations.

The question still remains whether we should also be able to support quad-based serializations on top of views of triple-based LDESes. After giving that some thoughts, I think it would only add complexity without any additional benefit.

sandervd commented 1 year ago

Maybe I misunderstand the disjointness issue, as at the moment I don't see an issue. Would I be correct to state what that the main issue is we are trying to solve is:

The following reasoning only holds under the assumption of a model where the member represents a version of one object in time. In this one object-one member model (objects/members are reduced to key-value pairs), determining the scope is quite straightforward: all triples with the same subject and their 'descendants' (blank nodes) form the scope of the update. If we add named graphs to the mix, the scope of the update becomes the graph-subject pair instead of the subject. That is why I would propose to keep the cardinality of inGraph limited to 0..1: if the property is missing, the default graph is assumed, otherwise the version object reflects the update of a given named object in a named graph.

To build on the previous examples:

ex:C1 a ldes:EventStream ;
      tree:view <currentpage> ;
      tree:member ex:M1v1, ex:M1v2 .

<currentpage> tree:namedGraphMembers true . # The property being used on a view instead of on top of the LDES → a client may ignore the named graphs

ex:M1v1 prov:generatedAtTime "2022-12-25T12:00" ;
                   tree:inGraph <someNamedGraph> ;
                   rdfs:label "Station Street" ;
                   dcterms:isVersionOf   ex:M1 .

ex:M1v2 prov:generatedAtTime "2022-12-25T12:01" ;
                   tree:inGraph <someOtherGraph> ;
                   rdfs:label "Ye Olde Station Street" ;
                   dcterms:isVersionOf   ex:M1 .

ex:M1v3 prov:generatedAtTime "2023-12-25T12:02" ;
                   tree:inGraph <someNamedGraph> ;
                   rdfs:label "Station Square" ;
                   dcterms:isVersionOf ex:M1 .

ex:M2v1 prov:generatedAtTime "2023-12-25T12:02" ;
                   tree:inGraph <someNamedGraph> ;
                   rdfs:label "Arthur Avenue" ;
                   dcterms:isVersionOf ex:M2 .         

This could be serialized to the following quads upon materialization:

ex:M1 rdfs:label "Station Square" . ex:M2 rdfs:label "Arthur Avenue" . ex:M1 rdfs:label "Ye Olde Station Street" . This way of does make adding a member to multiple named graphs verbose, as each pair would result in the creation of a new version object. That said, I agree that the added complexity is probably not worth it, as a client would always need to be graph-aware in order to process named-graph LDESs.
pietercolpaert commented 10 months ago

Current proposal, to be ratified end of September in the TREE CG, is to always extract the triples in the named graph matching the target of a tree:member. That will allow the initial proposal to work.