automerge / automerge-repo

MIT License
458 stars 51 forks source link

Large enough messages block the main thread for a long time, which can lead to socket timeouts on blocked clients #339

Open kid-icarus opened 5 months ago

kid-icarus commented 5 months ago

Hello again :)

I wanted to post some performance findings:

I have a roughly 10MB JSON file I'm parsing and creating a new automerge document from. Here's the size of the automerge document on disk:

Using next's Automerge.from: 1.6MB Using next + Automerge.RawString for all strings: 744KB Using stable Automerge.from (which from my understanding is using Automerge.Text for strings): 744KB

This is quite impressive!

Now, on to automerge-repo. Along with @georgewsu, I've been trying to stress test automerge-repo via the sync server.

What @georgewsu and I are trying to measure is the time it takes for the sync server to receive such a large document, and how long it takes another client to fetch it immediately after. I've created a to reproduce the findings. I've shared an example JSON payload with @alexjg, but I can't post it publicly. It's a ton of nested maps with a fair amount of strings in the maps. If there's anything I can do to improve the signal we're getting from the tests, or if I've made any mistakes in the tests that would lead to abnormal, suboptimal performance, I'd love to hear any ways I could improve them. Additionally, I'd be happy to contribute these types of tests/benchmarks back in a structured fashion so that folks can improve on existing limitations.

That said, with the payload used, the sync server took roughly 46 seconds to receive the document. While receiving the document, it blocked the main thread, severely limiting concurrency (other clients encountered socket timeouts). I pinpointed it to a call to A.receiveSyncMessage in the doc synchronizer. A deeper analysis shows calls into wasm; notably, calls to applyPatches comprise most of the time spent. Here's a CPU profile you can load into Chrome dev tools if you'd like to look at my findings CPU-20240411T110711.cpuprofile

The fact that the main thread is blocked is problematic regarding concurrency. I'm just brainstorming ideas here, but would it make sense to throw that call to A.receiveSyncMessage in a separate thread? I know that automerge-repo is intended to work in the browser and node and that isomorphic threading can be a pain (web workers vs. worker threads). Also, the overhead of sending the data to/from the worker without a SharedArrayBuffer might impact potential performance gains. Still, I would see the benefit being that, so long all CPU cores aren't busy receiving huge docs (I imagine an infrequent use case), at least a few huge docs could be received while other cores happily synchronize more minor edits.

pvh commented 5 months ago

Awesome, @kid-icarus. Note that there's a little sleight of hand in here for file size -- we're deflating at some point.

I see you're on 2.1.10 in the package.json. There was recently a performance fix for documents with lots of small objects. I think it went into 2.11.13 (which I think I shipped in the most recent automerge-repo a few days ago.)

Can you see if/how much that helps?

That said, there's an eventual physical limit to how much we can optimize the message receipt for a very large message: you can imagine a 1GB text field, for example. Our long-term strategy here has a few components:

Architecturally, there are two parts to Automerge: a CRDT that has to decide what the state of the world is/should be and a materialized view that represents the full document and is updated when changes occur. The latter has to live in the render thread (it's the state of your program, you need it there) but theoretically we could house the former elsewhere and communicate over some messaging channel.

Historically, Automerge worked this way (way back in the early, very slow days pre 1.0) but marshalling messages over thread boundaries is prone to performance and latency problems as well (not to mention architectural complexity for implementers), so in the 2.0 / Rust line we put an emphasis on "just be fast enough".

I think most of the work we've described above is "shovel ready" in the sense that we could move on it at any time if we had the development bandwidth to do so... but the project is not funded at a scale where we can do that. We are not a VC backed startup and our funding is earmarked to support our research goals.

So if you want us to do something, you can

In the third case, I'll note that even sending things upstream via the DIY approach is very welcome but it does impose costs on us: we have to review code, write tests, cut releases, and otherwise maintain the software going forward. If you build a thing we don't want or like... we won't merge it (so if you want us to merge it, keep us in the loop during the design.) That said, we have a bunch of folks in the community who have just become committers on the project and we'd love to have more.

At the moment our big projects are to rebuild the sync system to better support many documents (funded by NLNet), deliver rich text support (funded by sponsors), work on memory use (needed for in-house work). I think left to my own devices my own next big push will be around auth stuff. Maybe some version control APIs. Hard to say.

Anyway, every bit of research like this is extremely welcome and helpful! Working on these problems is very much "on the list" and we'd love to get to them just as soon as we're able. Thanks for reading my rant :)

kid-icarus commented 5 months ago

Thanks for the timely response @pvh, I really appreciate it šŸ™šŸ»

So first of all, apologies for not using the latest automerge(-repo) versions. I'm seeing a drastic improvement with automerge 2.1.13 and automerge-repo 1.1.15.

so in the 2.0 / Rust line we put an emphasis on "just be fast enough".

The 46 second wait is now roughly 3-4 seconds. I think this is certainly fast enough :) I'll go ahead and close this issue out. When not using RawString on all the strings, this increases to 9 seconds but roughly 90% of all the strings do not need to be collaborative and should really be RawStrings.

Without any changes, and using only RawString for strings in the document. Here is a CPU profile: CPU-20240411T174355.cpuprofile

I'll take some time to respond to the rest of your comment, but in short I would love to contribute back in a way that isn't a burden on everyone else. I understand the challenges of being resource-constrained, and wouldn't open any PRs without gauging interest and keeping folks in the loop beforehand. Feel free to reach out to me in the #devs channel in Discord (or you can always DM me if any of these issues become frustrating). I know everyone is doing their best given project priorities and constraints.

pvh commented 5 months ago

I'm going to reopen this because 2.5s is still quite long for a document this small. Thanks for the trace, though. The save() seems like about what I'd expect at 180ms (of which 40% is compression).

Looking at the rest, we've got several big processes running in this trace. In order:

1) 500ms - receiveSyncMessage

Looking at this trace I think we could plausibly get ~1s improvement with a little bit of plumbing work (reuse save() content, use last diff cache). The progressDocument stuff feels like there shsould be room to improve (though I haven't looked closely enough to have a fix) and the receiveSyncMessage stuff itself should be helped by Orion's current memory work (though we'll have to wait and see.)

@kid-icarus one useful reference would be a performance comparison to raw JSON with JSON.{stringify/parse}. That can be our performance reference since it's likely about as well optimized as browser vendors are able.