SEMICeu / LinkedDataEventStreams

The Linked Data Event Streams specification
https://tree.linkeddatafragments.org/linked-data-event-streams/
23 stars 9 forks source link

ActivityStreams-based retention policies for removed entities #50

Open pietercolpaert opened 6 months ago

pietercolpaert commented 6 months ago

Some back-end systems are able to expose an event stream of last updated items, but don’t keep track of things that have been deleted. In that way, a retention policy should exist that describes the fact that the LDES conceptually does contains the as:Remove activity, but that it hasn’t been included, yet can be inferred from the fact that the earlier included as:Create is not anymore part of this

Use cases:

pietercolpaert commented 6 months ago

We could also have type-based retention policies. E.g.: we only keep members typed Remove for 1 year.

Proposal:

1. An ldes:ImplicitRemovalPolicy

This policy says that a consumer will have to infer removals from the fact that something isn’t available on a follow-up traversal of all members in a view.

<> ldes:retentionPolicy [
   a ldes:ImplicitRemovalPolicy ;
   ldes:type as:Remove
].

2. Adding type filters to other retention policies

By default, the retention policies are maximized. When at least one retention policy is explicitly defined, the view promises to retain the AND of all policies. This makes this example tedious, as we need an OR: either we keep the removal for 1 year, either the object still exists and then we keep the last version until it would be removed, then we keep the removal for 1 year...

<> ldes:retentionPolicy [
   a ldes:DurationAgoPolicy ;
   tree:value "P1Y"^^xsd:duration ; 
   ldes:type as:Remove
],[
   a ldes:LatestVersionSubset ;
   ldes:amount 1 ;
  ldes:exceptType as:Remove 
] .

Adding ldes:type to a retention policy could restrict the retention policies to only objects of a specific type.

However, the difficulty here is that this must be interepreted as an AND, and therefore the first retention policy will never be applicable, as the remove will always be the latest version of a thing that we promise to keep. Therefore we should introduce an exceptTypeclause, stating that the last version retention policy is not applicable to things of type as:Remove

sandervd commented 6 months ago

I think what we really want to achieve is a retention policy that holds latest state (1 version) + time, allowing clients to see all writes. Related: https://github.com/SEMICeu/LinkedDataEventStreams/issues/36

We could get some inspiration from the semantics in Kafka, as they handle the same problem: https://docs.confluent.io/kafka/design/log_compaction.html#compaction-enables-deletes

We could also consider this as a change to the version retention semantics; when the latest version is a tombstone, all versions are deleted?