SEMICeu / LinkedDataEventStreams

The Linked Data Event Streams specification
https://tree.linkeddatafragments.org/linked-data-event-streams/
23 stars 9 forks source link

Add versioning metadata to the Collection #21

Closed hdelva closed 2 years ago

hdelva commented 2 years ago

Publishes a dataset often entails adding some additional statements (e.g., dct:isVersionOf), but this data then becomes indistinguishable from the original data elements. This can be an issue for operations such as version materializes, which should yield the current version of each concept, as it would appear without this LDES-specific metadata.

The specification currently describes how to add a version key to a retention policy, but this cannot be used for collections without a retention policy. Furthermore, the version key only specifies which predicate is used to link the version URI to the concept URI, but there's often another predicate that is used to assign a timestamp to this version. This timestamp metadata would also be useful for other issues, such as #16.

I would propose to move the versioning metadata (the version key and timestamp predicate) to the Collection description, and possibly make it mandatory. Perhaps it can become part of the shape description.

pietercolpaert commented 2 years ago

I agree with the suggestion to define this on top of the LDES entity. I think it should be a best practice to define it, but it should not be a requirement though. When defining it, you get a lot of benefits, but functionally everything could keep working without it being described.

Other use cases

This information would be needed to understand:

  1. how an automatic version materialization could work (related: https://github.com/TREEcg/event-stream-client/issues/13)
  2. how a version-based retention policy would know what the “last” version is (related: #16)
  3. upon what property the members in 1 page need to be ordered to understand how the collection grows, and thus also what member in the page can be automatically discarded when processing such a page again (this is a use case currently hard-coded in the Event Stream client: https://github.com/TREEcg/event-stream-client/blob/31444acb768639745d219b286e050e002f7f38d1/packages/actor-init-ldes-client/lib/EventStream.ts#L209)

Suggestion

ex:ES1 a ldes:EventStream ;
       tree:shape <...> ;
       ldes:versionOfPath dcterms:isVersionOf ;
       ldes:timestampPath dcterms:created .

Effects

Retention policy

This part of the spec won’t really change:

A version-based retention policy can be defined based on the original collection’s data, but can also be overwritten in the policy itself. The policy itself can also have the property ldes:versionKey which is an rdf:List of object identifier paths indicating that they must be combined. This is particularly useful in e.g. the use case of sensor data to indicate the last 5 sensor observations of a sensor’s observed property (ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .).

Version Materializations spec proposal

An official version materialization can be defined only if the original LDES defines both ldes:versionOfPath and ldes:timestampPath

A version materialization replaces the subject of a member with its ldes:versionOfPath IRI, and filters the data to match a certain version identifier, or to select the latest version of the members until a certain version literal.

A version materialization thus converts e.g., an LDES like this:

ex:ES1 a ldes:EventStream #  + proposed metadata see ↑
     tree:member [
         dcterms:isVersionOf <A> ;
         dcterms:created "2020-10-05T11:00:00Z"
         owl:versionInfo "v0.0.1";
         rdfs:label "A v0.0.1"
     ], [
         dcterms:isVersionOf <A> ;
         dcterms:created "2020-10-06T13:00:00Z";
         owl:versionInfo "v0.0.2";
         rdfs:label "A v0.0.2"
     ].

towards

ex:ES1v1 a tree:Collection ; # the members are no longer immutable
            ldes:versionMaterializationOf ex:ES1 ;
            ldes:versionMaterializationUntil "2020-10-05T12:00:00Z"^^xsd:dateTime ;
            tree:member <A> .

<A> rdfs:label "A v0.0.1" .