Python Implementation of the Automerge Server?

echarles commented 3 years ago

Hi, I am working on the Jupyter Realtime Collaboration project (https://github.com/jupyterlab/rtc) and one of the requirement is to avoid Node.js as dependency for the user (we set Python only as dependency).

I understand that the Automerge Server requires Node.js.

Are there any plan or ways to have a Python implementation of the Automerge server?

echarles commented 3 years ago

Maybe a python binding with automerge-rs. https://github.com/automerge/automerge-rs would be an viable option? (cc/ @anirrudh)

ept commented 3 years ago

Hi @echarles! There is no Automerge server, since Automerge is a client-only library. It can be used with any one of a range of different server or peer-to-peer implementations, as explained in the "sending and receiving changes" section of the README.

One example of a server you can use with Automerge is this third-party implementation. It does indeed use Node.js, but it's less than 300 lines of code, and would be pretty straightforward to port to Python or any other language.

The server generally only needs to store messages and forward them from one client to the others, but it doesn't need to parse or process those messages in any particular way. If you do need the server to participate actively, then you would need to run Automerge on the server too, in which case using the Rust implementation through Python bindings would be a route worth exploring. However, if the server just passively forwards messages, then there is no need to bind to the Rust implementation.

Hope that helps!

echarles commented 3 years ago

Thx @ept. Correct, CRDT has just peer collaborative actors. In the Jupyter case, I was seeing the server as just another actor that would need to be updated with the changes and persist them. Our current implementation is python only, hence the need to have a python implementation being able to persist the changes.

We also need a way to add authentication and permissions to the loading/editing/saving flow, hence the intuition that the server should be more than a transparent forward. Am I correct to think we would need to parse and process the Automerge CRDT updates to achieve such goals?

ept commented 3 years ago

You certainly can treat the server as another actor if you want, but if only the clients are going to make edits, then this would be more complicated than necessary. A simpler approach would be a server that only persists and forwards changes, but regards the changes as uninterpreted blobs.

The only thing a server needs to know about a change is its actorId and its sequence number (changes generated by the same actor are numbered sequentially starting with 1). For client-server sync you would use a vector clock. This is a map (dictionary) where the keys are actorIds, and the values are integers indicating how many changes we have seen from that particular actorId (or equivalently, the greatest sequence number seen from that actor). When a client connects to the server, it sends this vector clock, based on which the server can work out which changes the client hasn't yet seen, and send those changes to the client.

For authentication and permissions, any standard scheme (e.g. passwords, cookies) will work. That's part of the reason why we don't provide a server with Automerge out of the box: the requirements around authentication and persistence tend to be different for every app, so it's difficult to have one server implementation that fits all apps. But fortunately this stuff is very standard and Automerge doesn't require anything unusual.

echarles commented 3 years ago

@ept Very helpful insights! Need to digest all these and try out with hands-on.

One more question: What is the goal of the backend folder in the source tree? https://github.com/automerge/automerge/tree/main/backend

ept commented 3 years ago

No problem, good luck and please let us know how you get on!

What is the goal of the backend folder in the source tree?

Automerge is split into a frontend (which provides user APIs for reading and updating a document) and a backend (which contains most of the CRDT logic). Both parts are designed to run client-side in the same process, but the idea is that you can run them on two different threads: the frontend on the render thread along with the UI, and the backend on a background thread. This allows better responsiveness since some of the CRDT operations can be slow, and having the backend on a separate thread avoids blocking the UI. Thus, in this context, "backend" does not refer to a server.

HerbCaudill commented 3 years ago

@echarles I think it's helpful to break down the services that a server traditionally provides and rethink how you might meet those needs in a peer-to-peer context.

Relaying: One big advantage of a server is simply that it has a stable public IP address, so you never have any trouble making a connection with it from anywhere on the internet. It's theoretically possible for Alice's laptop to communicate directly over the internet with Bob's phone, but with the standards we have today (WebRTC etc.) it's really hard to pull off reliably in practice. It's a lot easier for Alice and Bob both to connect if there's a known, stable, public endpoint that they can each connect to, which can then act as an intermediary. As @ept's example above shows, this kind of "server" can be exceedingly simple - it's not really a server, and it needn't know anything about Automerge. All it needs to do is establish that Alice and Bob are interested in the same topic, and then pipe their sockets together and let them chat away. In Cevitxe we call this a "signal server" - perhaps "relay server" would be more descriptive.
Availability: Another advantage of a server is that it's always turned on and always online, Alice's laptop and Bob's phone may very well be switched off or offline at any given time, and if they can only synchronize when they're both online, it can be hard to work asynchronously. Rather than create a whole new server codebase to address this need, I like the idea of creating an always-on version of your client that you can deploy somewhere, and that can reliably persist its state using a traditional database or something. I haven't actually done this - it's on the roadmap for Cevitxe. I'd be curious to know if anyone else has actually done this.
Authentication: The relay server approach described above can provide a very basic sort of security, in the sense that it will only connect two peers if they both request the same topic (a.k.a. channel or document key). This can be good enough for many purposes, especially if that key is long and randomly generated. But it doesn't give the kind of fine-grained permissions control that a lot of applications require, and there's not an obvious remedy if it's compromised. One solution would be to add basic OAuth authentication to the signal server; I sketched out a possible design for that here. I eventually decided against that approach in favor of a completely decentralized solution. I've been fleshing this out for the past few months in a separate project called taco-js.

echarles commented 3 years ago

Relaying

Agree. We have that notion in place in our experiments https://github.com/jupyterlab/rtc/tree/main/packages/relay

All it needs to do is establish that Alice and Bob are interested in the same topic,

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

Availability

Agree also with your explanations. On top of availability, server also allows to ensure persistence (in jupyter case, the persistence of the notebooks).

Authentication

+1

HerbCaudill commented 3 years ago

Is it completely automerge agnostic? I would expect the interest in a topic resided in the content of the automerge CRDT messages?

No, the "topic" in this sense generally represents one Automerge document or a set of Automerge documents (e.g. a DocSet, or what Hypermerge and Cevitxe both refer to as a repository).

automerge / automerge-classic

Python Implementation of the Automerge Server? #285