automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 466 forks source link

Looking for a minimal implementation of Automerge/CRDTs #287

Closed nettybun closed 3 years ago

nettybun commented 3 years ago

Hello!

Sorry for the essay - TL;DR: Do you know of any small CRDTs like the Micromerge you wrote?

Thanks for all your work on Automerge. I've been researching CRDTs as a way of building git-like local-first software and feel like I've come to the right place. It's been very interesting to learn from watching conference talks (including yours!) and reading papers that are referenced in CRDT codebases.

Along with the local-first mindset, I also like to build apps/libraries that are small and more easily understandable, hackable, and interoperable. There's some overlap with those who care about bundle size too; devs who choose lighter libraries options like Preact/NanoID/Svelte/Sinuous may be doing so not only because it helps push against web bloat, but also that there's a better chance they (and future devs to come) will be able to wrap their head around the codebase.

I've been trying to find a small basic CRDT implementation but haven't had much luck. I know CRDTs have a certain level of essential-complexity (especially when focused on efficient text editing), but I know many projects would be happy to have basic list syncing (i.e todo lists, tictactoe, or pixelpusher-like apps that don't focus on text). Similarly, I've been surprised to not yet stumble on something that vibes well with vanilla JS either.

Projects like Automerge, Y.js, and Logux all have a lot of features. That's great for devs who want them, but adds a lot of complexity to apps that are trying to be small. For instance, if I have a 3kb (min+gzip) app that is an orderable todo list with websockets, then Automerge is ~20x larger than the entire app...

I know this isn't true for a lot of React+Redux apps that are already very large and buying into concepts like immutable data. I think it's fair to assume developers might choose those technologies; even Logux is built around assumption that you'll use Redux to implement undo/redo.

Still, I think having small CRDTs will help a lot with their adoption! If I can tell other devs "Paste these 300 lines of code and you'll have basic collaborative syncing" that's huge. Especially if there's a chance they can understand and hack on it. I have actually found this! In an article, https://josephg.com/blog/crdts-are-the-future/

Martin managed to make a tiny, slow implementation of automerge in only about 100 lines of code.

Leading me to your Micromerge CRDT https://github.com/automerge/automerge/blob/performance/test/fuzz_test.js. It looks great. I'm going to move forward with this and try writing some code to help in the manual construction of changes; https://github.com/heyheyhello/etch/blob/work/packages/shared/crdt.ts

I'm writing to you to ask if you've considered more formally publishing something like Micromerge, since there are developers (and article writers) who are looking for examples of small CRDTs. If not, are there small CRDTs you have in mind that you could link to which are more suitable for micro-frameworks?

Thanks!

nettybun commented 3 years ago

This would also help cross platform porting like #285. In that thread, you linked to https://gitlab.com/codewitchbella/automerge-client-server/ but it's way more than 300 lines since it imports all of Automerge! 😄

ept commented 3 years ago

Hi @heyheyhello! A basic implementation of many CRDTs can often be quite simple. I suggest starting not with code, but with academic write-ups of CRDTs (which usually include pseudocode). Some suggested resources:

There are also tons of CRDT implementations that various people have created for learning/as side-projects, in various states of unmaintainedness. I haven't looked at them in any detail though, so I can't say anything about their quality.

It also depends a bit on which datatypes you want. A simple map datatype, for example, might be just 20 lines of code (see my lecture notes for an example). A list is more complicated. A list that allows reordering of elements (as you might want for a todo list) is more complicated still.

In that thread, you linked to https://gitlab.com/codewitchbella/automerge-client-server/ but it's way more than 300 lines since it imports all of Automerge!

True, although as I point out in #285, it is not actually necessary for the server to import Automerge (assuming the server only provides storage and networking, and doesn't actively modify the document). It would not be hard to write a minimal server implementation whose only dependency is a WebSocket implementation.

echarles commented 3 years ago

@ept I was just looking for a simple example to start with (if possible react based)

https://github.com/automerge/pushpin is a electron app and uses hypermerge, so maybe more difficult to try. https://github.com/automerge/pixelpusher is react but not maintainer since 3 years (not sure if it will work)

Any other place where we could find that very simple and basic, yet functional, example with e.g. a react.js based ui?

nettybun commented 3 years ago

@ept Thanks for the resources. I'll do more research and will try to implement my own CRDT as you suggest.

I've looked into many CRDT repos (and have been down the rabbithole of the GitHub search you linked) and will try again but in my experience they were all difficult to reverse-engineer; many don't link to the paper they're implementing either. I've seen a few that are very... academic and spend a good deal of code on type systems and axioms without explaining why.

The writeup from James Long was a very interesting read. I hadn't heard someone use Hybrid Logical Clocks before or merkle trees. I'll read the HLC paper and compare it to the Lamport Timestamp paper to try to figure out why to use one over the other. I remember seeing the other day in #284 how to determine changes to sync, so I'll read on merkle trees and try to pick a method that's easiest for non-academic devs to understand.

True, although as I point out in #285, it is not actually necessary for the server to import Automerge

Right! That makes sense :)

nettybun commented 3 years ago

@echarles We're mostly talking about libraries that implement simple CRDT algorithms here, not apps that use them. You can see some repos that use Automerge here https://github.com/automerge/automerge/network/dependents

I'll be honest I had to go 5 pages back before I found something that might appeal - a lot of the things that use Automerge are complex (videochat or text editors), libraries not apps, or are apps that aren't using it anymore but are still in the list somehow...

I found this for you: https://github.com/HerbCaudill/todomvc/ which uses React+Redux

HerbCaudill commented 3 years ago

@heyheyhello @echarles That project (https://github.com/HerbCaudill/todomvc/ ) was kind of a throwaway proof of concept - not totally sure what state it's in.

If you're looking for a basic working example, todo demo in Cevitxe is a better place to start: clone http://github.com/DevResults/cevitxe , then yarn dev:todo:start.

echarles commented 3 years ago

Wonderful. Thx all and sorry for hijacking this issue/discussion.

nettybun commented 3 years ago

Update: Found this gem https://github.com/jlongster/crdt-example-app ✨ from James Long which implements todo list CRDT in 600 lines using the HLC and merkle tree methods. There's also annotated notes (!) to explain how it all works. https://github.com/clintharris/crdt-example-app_annotated/blob/master/NOTES.md I'll close this, thanks all!