Closed hdelva closed 2 years ago
I agree with the suggestion to define this on top of the LDES entity. I think it should be a best practice to define it, but it should not be a requirement though. When defining it, you get a lot of benefits, but functionally everything could keep working without it being described.
This information would be needed to understand:
ex:ES1 a ldes:EventStream ;
tree:shape <...> ;
ldes:versionOfPath dcterms:isVersionOf ;
ldes:timestampPath dcterms:created .
ldes:versionOfPath
is a property path (Shacl) to the property that will indicate a URI of the non-versioned objectldes:timestampPath
is a property path (Shacl) to the timestamp This part of the spec won’t really change:
A version-based retention policy can be defined based on the original collection’s data, but can also be overwritten in the policy itself. The policy itself can also have the property ldes:versionKey
which is an rdf:List
of object identifier paths indicating that they must be combined. This is particularly useful in e.g. the use case of sensor data to indicate the last 5 sensor observations of a sensor’s observed property (ldes:versionKey ( ( sosa:observedProperty ) ( sosa:madeBySensor ) ) .
).
An official version materialization can be defined only if the original LDES defines both ldes:versionOfPath
and ldes:timestampPath
A version materialization replaces the subject of a member with its ldes:versionOfPath
IRI, and filters the data to match a certain version identifier, or to select the latest version of the members until a certain version literal.
A version materialization thus converts e.g., an LDES like this:
ex:ES1 a ldes:EventStream # + proposed metadata see ↑
tree:member [
dcterms:isVersionOf <A> ;
dcterms:created "2020-10-05T11:00:00Z"
owl:versionInfo "v0.0.1";
rdfs:label "A v0.0.1"
], [
dcterms:isVersionOf <A> ;
dcterms:created "2020-10-06T13:00:00Z";
owl:versionInfo "v0.0.2";
rdfs:label "A v0.0.2"
].
towards
ex:ES1v1 a tree:Collection ; # the members are no longer immutable
ldes:versionMaterializationOf ex:ES1 ;
ldes:versionMaterializationUntil "2020-10-05T12:00:00Z"^^xsd:dateTime ;
tree:member <A> .
<A> rdfs:label "A v0.0.1" .
Publishes a dataset often entails adding some additional statements (e.g., dct:isVersionOf), but this data then becomes indistinguishable from the original data elements. This can be an issue for operations such as version materializes, which should yield the current version of each concept, as it would appear without this LDES-specific metadata.
The specification currently describes how to add a version key to a retention policy, but this cannot be used for collections without a retention policy. Furthermore, the version key only specifies which predicate is used to link the version URI to the concept URI, but there's often another predicate that is used to assign a timestamp to this version. This timestamp metadata would also be useful for other issues, such as #16.
I would propose to move the versioning metadata (the version key and timestamp predicate) to the Collection description, and possibly make it mandatory. Perhaps it can become part of the shape description.