SEMICeu / LinkedDataEventStreams

The Linked Data Event Streams specification
https://tree.linkeddatafragments.org/linked-data-event-streams/
23 stars 9 forks source link

Separating metadata from LDES members #45

Closed DylanVanAssche closed 8 months ago

DylanVanAssche commented 8 months ago

LDES members are mangled with the LDES metadata such as versionOf, timestamp, etc. which is a bit against the philosophy of the OSI model which assumes your have an envelope with metadata where the actual member data in resides.

Problem

LDES member data is mixed with their metadata. For example:

<MemberJefke#8> a foaf:Person;
  foaf:name "Jefke";
  foaf:age "45";
  dct:created "2023-01-01T00:00:00.00Z";
  dct:versionOf <MemberJefke>;
.

In the example, the member describes about a foaf:Person but also mixes the metadata within when the member was created. This gives a weird view to this person for example, you could assume that it was created on 2023-01-01 but its age is 45 which is not possible. The bigger problem here is that metadata about the member is integrated into the member instead of separately where it points to the member, for example:

<MemberJefkeMetadata#8> 
  dct:created "2023-01-01T00:00:00.00Z";
  dct:versionOf <MemberJefke>;
  dct:subject <MemberJefke#8>;
.
<MemberJefke#8> a foaf:Person;
  foaf:name "Jefke";
  foaf:age "45";
.

Similar approaches

CC: @sandervd

sandervd commented 8 months ago

I certainly agree that the lack of encapsulation between object and (event) metadata is a problem at the moment. However, I think that recent events in the Tree community group offer opportunities: Within Tree it is now allowed to use a named graph as a means to encapsulate state data. The graph encapsulation is the most elegant solution so far, keeping in mind that a fragment could hold multiple updates to the same object. If graphs are not an option, we would need to define a bijective function (serialize/de-serialize) that encapsulates the data. I did an attempt to define such function here: https://github.com/sandervd/tree-experiments If possible, I'd opt for using named graphs though, as this keeps the data as-is (state is readable without interpretation), while ensuring the necessary separation.

What I would very much like to define is a strict transactional specification of LDES, allowing exact named graph replication over the web. Some related issues: https://github.com/TREEcg/specification/pull/78 https://github.com/SEMICeu/LinkedDataEventStreams/issues/24 https://github.com/SEMICeu/LinkedDataEventStreams/issues/37

pietercolpaert commented 8 months ago

I also agree that I don’t think encapsulation is the real issue anymore: the member extraction algorithm in TREE now allows for named graphs if you really want to have a clear cut. The benefit of LDES however is that your event stream can also just contain your already existing lifecycled objects and can omit the need for extra envelopes when you don’t need them either.

What I’m very interested in indeed though is the transactional aspect: how do we indicate that we have a consistent knowledge graph across LDESes? For instance, what if we split Linked OpenStreetMap into 3 LDESes: one for nodes, one for ways and one for relations. Then, for each LDES individually, multiple objects are added into one transaction. Just processing one member would not be very valuable, as you would get an inconsistent replication: the osm:Way would for example not yet have the necessary osm:Nodes to point to.

A solution would be to have a transaction system that indicates whether the transaction is still in progress, and define the bounds (probably based on a timestamp?). The set of members can then only be processed into a derived view or service from the moment the transaction is marked as completed.

Any ideas on this?

pietercolpaert commented 8 months ago

I’ll close this issue in favor of https://github.com/SEMICeu/LinkedDataEventStreams/issues/46