JeffreyBenjaminBrown / hode

rslt, take five-ish
GNU General Public License v3.0
147 stars 4 forks source link

Merges and automatic disambiguation #12

Open JeffreyBenjaminBrown opened 3 years ago

JeffreyBenjaminBrown commented 3 years ago

[This duplicates what I just posted to Twitter, in response to @mwotton who wrote https://twitter.com/mwotton/status/1383429467306008585.]

Yes [there's a good way to handle merges], although it's not coded yet. Say you wrote about (the) Ben Franklin, and I about my cousin Ben Franklin. On merging our data, the two would by default look like one expression, decorated with a note that there might be a conflict. 1/4

If you later notice a mismatch -- "Franklin never went to Arizona, what the hell?" -- you could automatically split them into "#the Ben Franklin #(written about by) Jeff" and "#the Ben Franklin #(written about by) Mark", because Hode had invisibly retained their provenance. 2/4

Optionally, you could then rename your Ben Franklin in a completely disambiguating way, such as "#the Ben Franklin #(described by) https://es.wikipedia.org/wiki/Benjamin_Franklin". 3/4

There does arise the possibility that our data contradict each other. Even with true AI, that would be impossible to avoid automatically. But the data model does not permit a merge to create an invalid state. 4/4

JeffreyBenjaminBrown commented 3 years ago

And regarding deletions:

The trick would be to not actually delete, but rather record the fact that you deleted it, and not I. Your view of the data would by default hide those "deleted" relationships, but with an indicator that something is hidden. 1/2

My view would show them, and they would each be in a relationship of the form "Mark #deleted ". I might by default use a filter that replaces any number of relationships of the form " #deleted " with a single flag, to save space. 2/2

JeffreyBenjaminBrown commented 3 years ago

[And what about deletions?]

It's all future work ;) And it's a deep question. There's a good chance I'll want your edit, esp. if you wrote the original.

The fact that you replaced X with Y seems important to retain -- more informative than "deleted X" and "added Y" in isolation. Otherwise I disagree with the deletion, whereas if I saw the full edit I'd see you were correct.

A good default might be to just keep the new one in my graph, with a flag (i.e. a relationship abbreviated to a sidebar decoration) indicating that it has history, and maybe a louder flag if you edited something I wrote.

Or maybe people I trust get that treatment, and for ones I don't, their edits only translate into a flag, while my graph still shows my original data.

But what if you change "X #r Y" to "X #r Z", and I already had both relationships in my graph? Then I'd probably want "Mark #deleted (X #r Y)" visible by default, and "Mark #(also wrote) (X #r Z)" available if I want to see it.

I'm sure I haven't worked through all possibilities, but yes, as you say, event sourcing seems like the way.