automerge / automerge

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
https://automerge.org
MIT License
4k stars 176 forks source link

Automerge Rust - performance issue with Automerge::clock_at #686

Open gterzian opened 1 year ago

gterzian commented 1 year ago

When running the bakery example with Spanreed, I've noticed the following:

At the beginning of running the program(as part of a three process ensemble), CPU usage is a few percent, and sampling the process shows the repo thread not doing much(mostly blocked on the select).

Screenshot 2023-07-30 at 4 47 29 PM

if you let the example run for a while, say 10 minutes, CPU usage goes to close to 100%, and the sample shows the repo thread spending most of it's time running Repo::sync_documents, which is mostly running Automerge::clock_at, which seems to mostly spend time manipulating a btree.

Screenshot 2023-07-30 at 5 06 38 PM

Detailed view of Repo::sync_documents:

Screenshot 2023-07-30 at 5 07 05 PM

And a detailed view of Automerge::get_changes:

Screenshot 2023-07-30 at 5 07 49 PM
gterzian commented 1 year ago

This is running this branch, using Automerge 0.5.0. (problem was similar with ChangeGraph when on 0.4.0).

You can get to 100% CPU almost immediately if you remove this sleep.

alexjg commented 1 year ago

Most of the work that has to be done in get_changes is generating clocks, so what this is telling us is that the application is spending most of its time in get_changes. I think we'll need to do a bit more investigating to figure out whether this is a problem in automerge, spanreed, or the bakery example. Questions that occur to me:

  1. How many times are we calling get_changes? I.e. what is the average time of a get_changes call? Does this increase over time?
  2. How many changes are we returning from get_changes? How big are the documents involved? Does this increase over time?
  3. Can we get hold of a particular document which is causing trouble and investigate a single get_changes call?
alexjg commented 1 year ago

Actually, given that get_changes is most likely being called inside generate_sync_message we should probably repeat all the questions above except replace get_changes with generate_sync_message

gterzian commented 1 year ago

I will double check when I am back in the office next week, but so far:

  1. The time to call 'get_changes' increases over time, not the amount of times it is called(so it doesn't seem to be an explosion of sync messages). Will check the exact timing.
  2. Will check.
  3. Perhaps the bakery example can be changed to stop at a certain number corresponding to a run long enough to be problematic, and an api that would call 'get_changes' could be added so that a single call can be made at that point.
gterzian commented 1 year ago

Some prints from the beginning of the run(low 10s CPU usage):

Messages received from RepoId("12c0d873-7aa5-4dfc-96a2-4eb9001cd53b") 3
Running receive_sync_message took 28208 milliseconds.
Running receive_sync_message took 22958 milliseconds.
Running receive_sync_message took 21792 milliseconds.
Running generate_sync_message took 35542 milliseconds.
Running generate_sync_message took 39375 milliseconds.
Generating 0 sync messages 78833 milliseconds.
Syncing docs with 3 sync messages took 199500 milliseconds.
Syncing docs with 0 sync messages took 42 milliseconds.
Messages received from RepoId("b9e46fdc-291a-4b63-9d41-baf5dcd60b33") 1
Running receive_sync_message took 19083 milliseconds.
Running generate_sync_message took 30541 milliseconds.
Running generate_sync_message took 26042 milliseconds.
Generating 0 sync messages 59958 milliseconds.
Syncing docs with 1 sync messages took 88416 milliseconds.

Prints from the end of the run, about a cup of coffee later (at 99% CPU)

Messages received from RepoId("b9e46fdc-291a-4b63-9d41-baf5dcd60b33") 3
Running receive_sync_message took 4690708 milliseconds.
Running receive_sync_message took 4640333 milliseconds.
Running receive_sync_message took 4661333 milliseconds.
Running generate_sync_message took 8178041 milliseconds.
Running generate_sync_message took 8043792 milliseconds.
Generating 1 sync messages 16230500 milliseconds.
Syncing docs with 3 sync messages took 30251834 milliseconds.
Running generate_sync_message took 8201792 milliseconds.
Running generate_sync_message took 8129125 milliseconds.
Generating 0 sync messages 16337667 milliseconds.
Syncing docs with 0 sync messages took 500 milliseconds.
Messages received from RepoId("12c0d873-7aa5-4dfc-96a2-4eb9001cd53b") 2
Running receive_sync_message took 4331584 milliseconds.
Running receive_sync_message took 3965834 milliseconds.
Running generate_sync_message took 8085666 milliseconds.
Running generate_sync_message took 8069416 milliseconds.
Generating 1 sync messages 16166167 milliseconds.
Syncing docs with 2 sync messages took 24489042 milliseconds.

It seems timings overall go up by about 150-200 times, while the number of messages applied/generated stays the same.

For context, here is a PR that shows where those prints come from.

I think this answers 1 above. @alexjg

(by the way those are nanoseconds, not milliseconds, I made an error in the prints)

alexjg commented 1 year ago

This is running in release?

gterzian commented 1 year ago

This is running in release?

Yes, running the commands as found in the README.

gterzian commented 1 year ago

How big are the documents involved? Does this increase over time?

The documents consists of this. The u32 are incremented over time. The maps do not grow over time.

alexjg commented 1 year ago

Nice. Okay I think what we need to do is get hold of a single problematic message and document somehow.

gterzian commented 1 year ago

It appears that as the ChangeGraph grows, this loop takes more and more time:

https://github.com/automerge/automerge/blob/e7114fd4d6cdcd1b93aac071f8282548eec23c98/rust/automerge/src/change_graph.rs#L171

All the manipulation of data structures in that loop grow at about the same rate, with the btrees being the most expensive.

the heads argument to clock_for_heads remains constant, usually one or two.

gterzian commented 1 year ago

I've opened a PR that highlights the issue. The test currently fails because the duration of calling clock_for_heads for a large change graph takes about 2 millis, while getting one for a small change graph takes less than one millis. Measured in nanos it's about a 75x difference. ff you increase the numbers of changes by ten, the difference goes up to by ten as well, to about 750 times.

gterzian commented 1 year ago

Perhaps a dupe of, or similar issue as, https://github.com/automerge/automerge/issues/596