federatedbookkeeping / research

Research notes about Federated Bookkeeping and related topics
https://federatedbookkeeping.org
MIT License
7 stars 2 forks source link

UUIDs #40

Open michielbdejong opened 2 months ago

michielbdejong commented 2 months ago

it is said the creator of ~an item~ a piece of content can choose the UUID.

However, what if you want to merge two observations of one real-world event? Shouldn't we deduplicate through the real world?

tantaman commented 2 months ago

This book https://michaelperry.net/the-art-of-immutable-architecture/ has a chapter dedicated to these sorts of things under "historical modeling"

But, yeah, I'd add a second index which is not unique. The developer would have to deal with many copies of the same thing with the same key. You'd need to form some sort of causal relationship so you can determine which copy comes first. Historical modeling covers that latter bit in detail where the relationship is a function of your domain.

More general options would be some sort of causal graph where each item refers to parent events. E.g., git, diamond types, https://vlcn.io/blog/crdt-substrate

michielbdejong commented 2 months ago

Right! I think there is a subtle distinction here.

To refer to a piece of content, assign it an immutable Unique ID (UID). [...] By “piece of content”, I mean anything that the user views as a distinct “thing”, with its own long-lived identity: an ingredient, a spreadsheet cell, a document, etc. Note that the content may be internally mutable. Other analogies:

  • Anything that would be its own object in a single-user app. Its UID is the distributed version of a “pointer” to that object.
  • Anything that would get its own row in a normalized database table. Its UID functions as the primary key.

So I think "piece of content" is the right term in this context, and it is defined at the level of the app GUI or inside the code / database. It comes into existence when the user makes a certain UI gesture.

But suppose I add the Eiffel tower to our database and you do too. Then there would be two distinct pieces of content, but neither of us created the actual Eiffel tower through our UI gestures, so we would still want to deduplicate that.

Systems like git act on syntactical equivalence: if we write the same line of text byte-by-byte then there is no conflict. But I'm also interested in semantic equivalence: if we both assign an identifier to something that exists independently from both of us, then we created two pieces of content, but these pieces of content refer to the same thing in the physical or some other outside world.