dmonad / crdt-benchmarks

A collection of CRDT benchmarks
Other
443 stars 29 forks source link

Add initial support for automerge-wasm in the benchmark #17

Closed ept closed 2 years ago

ept commented 2 years ago

Now that the Rust/Wasm implementation of Automerge is in a good shape, I'd like to add it to this benchmark. Here is an initial patch to set it up. At this stage it's more about getting things working than any particular performance numbers.

I've not yet managed to get the JavaScript generated by wasm-bindgen working with a bundler such as Rollup or Webpack. I guess you'll face a similar challenge with Yrs – did you work out how to make it work? For now I've set up the automerge-wasm tests to simply run the JS files directly in Node without any bundler step. I had to add .js to the end of a bunch of imports in order to make Node happy (hope that's ok; this doesn't seem to break the Rollup setup you're using for the Yjs benchmarks). I've not yet figured out how to run the Wasm benchmarks in a browser.

Open questions:

dmonad commented 2 years ago

Thank you so much Martin!

The numbers look pretty impressive. Especially the update encoding (129kb for the editing trace of 182k characters) is very impressive!

Running these benchmarks in the browser in the browser is not too important right now. We've also not figured that out yet for Ywasm.

Do we want to explicitly free Wasm memory after each benchmark? At the moment they're leaking memory as we're not yet using FinalizationRegistry.

I'm explicitly running node's garbage collector after each benchmark. Maybe that is already handled. If you figure out a way to make the benchmarks more performant by waiting for an event, we can add this event to the interface.

There are a couple of places in the benchmarks where the comparison is not quite fair to Automerge because it does things a bit differently to Yjs. I'll do some thinking about tweaks we could make to the benchmarks to ensure it's a fair comparison.

What do you mean specifically?

ept commented 2 years ago

I'm explicitly running node's garbage collector after each benchmark.

I don't think that is sufficient. The problem is that memory allocated within Wasm is not managed by the Node GC (to Node the entire wasm module's memory is just one big byte array); if you have a JS object that holds a reference to Wasm memory, and then that JS object is garbage collected, the system does not know that the corresponding Wasm memory should also be freed, unless you explicitly handle the deallocation via FinalizationRegistry. Presumably Yrs will also need to deal with the same limitation of Wasm. Currently we're relying on the application explicitly calling .free() when a document in Wasm space is no longer needed.

What do you mean specifically?

I need to work through the benchmarks more carefully, but one thing that comes to mind is: the measurement of avgUpdateSize presumes that the way two peers sync up their state is by exchanging a log of individual updates, when in fact the sync protocol may have a more efficient way of bringing two peers in sync with each other. Maybe rather than measuring the size of individual updates, it would be better to measure the total network traffic required to bring two nodes in sync with each other after a certain amount of divergence. That would better reflect the cost of syncing updates in a real application.

dmonad commented 2 years ago

Regarding the gc on wasm, that makes total sense.

The benchmarks simulate how two clients exchange updates in a live collaboration session. At the beginning of a session, all clients are synced. Every action that manipulates a document should create some kind of update that is sent to the other clients (or server). This should be measured by avgUpdateSize. It basically just shows that the CRDT supports incremental updates, and doesn't require sending the complete document on every change. It should be possible to reconstruct the merged document from the individual updates.

The docSize measures the size of the document and how it would be stored in a database. Currently, we don't measure how two clients sync after they diverged. This is something we could add as a separate benchmark.

dmonad commented 2 years ago

I propose that we add a B5 benchmark suite where we perform some actions on a document while the other side is disconnected and then synchronize using the preferred sync protocol. E.g. The simplest case would be: "perform one insert and then sync".

ept commented 2 years ago

The benchmarks simulate how two clients exchange updates in a live collaboration session. At the beginning of a session, all clients are synced. Every action that manipulates a document should create some kind of update that is sent to the other clients (or server). This should be measured by avgUpdateSize.

That makes sense, but I think there is still a risk of the numbers being misinterpreted. For example, in test B1.1 the avgUpdateSize for Yjs is 27 bytes and for Automerge is 121 bytes. At first glance, it looks like Yjs is 4.5x better. But in an actual live collaboration session, the difference between 27 and 121 bytes is irrelevant because both will fit in a single packet, and the size of the packet will be dominated by the headers of the lower-level protocols (WebSocket, TLS, TCP, IP, Ethernet); the extra 94 bytes used by Automerge make virtually no difference to users. All that matters here is that the update message fits in a single packet.

Hence I'm trying to find metrics that reflect what users experience. But we can do that on a separate PR, for now I'm just happy to have the benchmarks working :)

dmonad commented 2 years ago

You are right about avgUpdateSize can be misinterpreted easily. I'm going to make it more clear what this number is referring to.

I'm gonna merge this and publish the updated benchmarks ASAP. I'm currently traveling and a bit short on time.

ept commented 2 years ago

Thanks! No rush.