automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 466 forks source link

Python Implementation of the Automerge Server? #285

Open echarles opened 3 years ago

echarles commented 3 years ago

Hi, I am working on the Jupyter Realtime Collaboration project (https://github.com/jupyterlab/rtc) and one of the requirement is to avoid Node.js as dependency for the user (we set Python only as dependency).

I understand that the Automerge Server requires Node.js.

Are there any plan or ways to have a Python implementation of the Automerge server?

echarles commented 3 years ago

Maybe a python binding with automerge-rs. https://github.com/automerge/automerge-rs would be an viable option? (cc/ @anirrudh)

ept commented 3 years ago

Hi @echarles! There is no Automerge server, since Automerge is a client-only library. It can be used with any one of a range of different server or peer-to-peer implementations, as explained in the "sending and receiving changes" section of the README.

One example of a server you can use with Automerge is this third-party implementation. It does indeed use Node.js, but it's less than 300 lines of code, and would be pretty straightforward to port to Python or any other language.

The server generally only needs to store messages and forward them from one client to the others, but it doesn't need to parse or process those messages in any particular way. If you do need the server to participate actively, then you would need to run Automerge on the server too, in which case using the Rust implementation through Python bindings would be a route worth exploring. However, if the server just passively forwards messages, then there is no need to bind to the Rust implementation.

Hope that helps!

echarles commented 3 years ago

Thx @ept. Correct, CRDT has just peer collaborative actors. In the Jupyter case, I was seeing the server as just another actor that would need to be updated with the changes and persist them. Our current implementation is python only, hence the need to have a python implementation being able to persist the changes.

We also need a way to add authentication and permissions to the loading/editing/saving flow, hence the intuition that the server should be more than a transparent forward. Am I correct to think we would need to parse and process the Automerge CRDT updates to achieve such goals?

ept commented 3 years ago

You certainly can treat the server as another actor if you want, but if only the clients are going to make edits, then this would be more complicated than necessary. A simpler approach would be a server that only persists and forwards changes, but regards the changes as uninterpreted blobs.

The only thing a server needs to know about a change is its actorId and its sequence number (changes generated by the same actor are numbered sequentially starting with 1). For client-server sync you would use a vector clock. This is a map (dictionary) where the keys are actorIds, and the values are integers indicating how many changes we have seen from that particular actorId (or equivalently, the greatest sequence number seen from that actor). When a client connects to the server, it sends this vector clock, based on which the server can work out which changes the client hasn't yet seen, and send those changes to the client.

For authentication and permissions, any standard scheme (e.g. passwords, cookies) will work. That's part of the reason why we don't provide a server with Automerge out of the box: the requirements around authentication and persistence tend to be different for every app, so it's difficult to have one server implementation that fits all apps. But fortunately this stuff is very standard and Automerge doesn't require anything unusual.

echarles commented 3 years ago

@ept Very helpful insights! Need to digest all these and try out with hands-on.

One more question: What is the goal of the backend folder in the source tree? https://github.com/automerge/automerge/tree/main/backend

ept commented 3 years ago

No problem, good luck and please let us know how you get on!

What is the goal of the backend folder in the source tree?

Automerge is split into a frontend (which provides user APIs for reading and updating a document) and a backend (which contains most of the CRDT logic). Both parts are designed to run client-side in the same process, but the idea is that you can run them on two different threads: the frontend on the render thread along with the UI, and the backend on a background thread. This allows better responsiveness since some of the CRDT operations can be slow, and having the backend on a separate thread avoids blocking the UI. Thus, in this context, "backend" does not refer to a server.

HerbCaudill commented 3 years ago

@echarles I think it's helpful to break down the services that a server traditionally provides and rethink how you might meet those needs in a peer-to-peer context.

echarles commented 3 years ago

Relaying

Agree. We have that notion in place in our experiments https://github.com/jupyterlab/rtc/tree/main/packages/relay

All it needs to do is establish that Alice and Bob are interested in the same topic,

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

Availability

Agree also with your explanations. On top of availability, server also allows to ensure persistence (in jupyter case, the persistence of the notebooks).

Authentication

+1

HerbCaudill commented 3 years ago

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

No, the "topic" in this sense generally represents one Automerge document or a set of Automerge documents (e.g. a DocSet, or what Hypermerge and Cevitxe both refer to as a repository).