Performance/scalability?

aral commented 6 years ago

Hey there,

First off, thank you for sharing your paper (A Conflict-Free Replicated JSON Datatype) and this library with the world.

I started playing with automerge this afternoon and wanted to run some basic tests to see if I could use it for the federated personal web site system I’m working on, as I want it to work offline and I’d rather not privilege servers over clients (the goal is, ideally, for this to be a stepping stone towards a p2p world, which I believe is what we’re all working for) :)

So to cut to the chase, I wanted to share my very unsophisticated findings and to also inquire about whether, in the words of Steve Jobs, I’m “holding it wrong” :)

Multiple inserts

Code: https://source.ind.ie/snippets/80

I first tried pushing incremental numbers to an array in a document via the change() method. The timings (on my MacBook) ranged from 6ms for 1 push to 43.75 seconds for 10,000. 100 pushes took 175ms and 1,000 took 1.345 seconds.

The storage requirements (tested very roughly using util.inspect to flatten the object, including all hidden fields) also seem to follow a similar curve. A single insert took up 3,557 bytes (size of original object: 18 bytes), 10: ~17KB, 100: ~160KB, 1,000: ~1.6MB, and the array with 10,000 numbers took up ~ 16MB (size of original object: ~80 bytes). Again, my testing methodology doesn’t necessarily reflect the actual amount of space these objects would take up either in memory or on the file system but, unless I’ve done something really daft (which is always a possibility), it does look like the system would result in a sluggish interface on operations on an array with ~100 items or so.

Single insert

Then I thought maybe I am holding this wrong and it is meant to be used with batch inserts.

Code: https://source.ind.ie/snippets/81

So instead of testing, say, 10,000 pushes to an array, I wanted to test a single change to the object where an array with 10,000 items is added.

The results I got mirror those of the first scenario.

The timings seem to follow a similar curve at first but the results I got for the array with 10,000 items was ~3x slower at ~115 seconds vs ~43 seconds using the first technique.

As for the document sizes, they came out to be slightly less than with the first technique from the 100 item mark onwards. It was still ~1.3MB for 1,000 items and ~13MB for 10,000.

I’d love to hear your thoughts on this as well as any criticism of the above (including what I was benchmarking and how). My use case is a personal web site allows both posts and messaging between sites. I was using the 10,000 item test as an upper limit for initial use (e.g., 10,000 posts in a category, 10,000 messages in a conversation.) A more realistic amount might be 20-30 to a few hundred. But even at those levels, the latency would require operations to take place outside of the main thread to avoid a sluggish UI.

Also, since Automerge is already being used in two real-world applications, I’d love to hear of your actual experiences with performance, scalability, and storage requirements.

Thanks again for making this and sharing it. It’s a very hard problem domain indeed.

Update

Based on the feedback of @pvh and @j-f1 (thanks, folks!), I just updated my tests to include:

A key setting test (as opposed to array.push()) – the results are generally comparable to the previous ones:

And to use Automerge.save(doc) to test storage size (the sizes are 3x-5x smaller):

I’ve added the test files to their own repository at https://source.ind.ie/indienet/spikes/crdt/automerge

pvh commented 6 years ago

Wow! Thanks so much for this work Aral! I suspect there's some internal O(n^2) data structure update causing grief here. We've used automerge at larger scales with both Pixelpusher and Trellis without seeing quite this bad of a performance collapse, so I wonder if it's "how you're holding it" (though unlike Apple, we can probably fix our problems instead of blaming the user!)

I took a look at your performance script and I wonder if you could benchmark updating different keys that many times, which might look something like

document.change((doc) => {
  doc["key" + i] = true;
}

or something along those lines? Basically I suspect that having all the updates go to the same value might be what's causing the problem.

Really appreciate you contributing this analysis and very excited to look more closely at what might cause performance problems and how we can fix them.

j-f1 commented 6 years ago

For measuring the size of the doc, you should be able to measure the length of Automerge.save(doc), which returns a string containing the document and its history.

schrepfler commented 6 years ago

This is not trying to steal automerge's thunder (and thunder it has) @aral, have you tried benchmarking https://gun.eco/ for your use case?

aral commented 6 years ago

@schrepfler I’ve looked into gun but I’m wary of it for a number of reasons (including that it has VC and because I don’t see the particular conflict-free algorithm working for my purposes). That said, great to see different people tackling this problem space in different ways and sharing their work. We can only be stronger for it :)

ept commented 6 years ago

Hi @aral, this is great, thank you for your measurements. You're not "holding it wrong" — the access patterns in your tests are things that should be supported efficiently.

The util.inspect size metric is probably not very meaningful, because Automerge internally keeps several references to the same object; those objects appear only once in memory, but would appear multiple times in the util.inspect output. But the result of Automerge.save(), as suggested by @j-f1, is a reasonable metric.

In my last round of profiling, as far as I remember, I got results broadly similar to yours. Looking into the breakdown, I found that over half the CPU time was being spent in garbage collection. Therefore I figured that in order to improve the performance of Automerge, we need to move to data structures that are more memory-efficient and GC-friendly. That would also reduce the memory use.

I have worked out an algorithm for Automerge's internal structures that will be much more memory-efficient, and have started implementing it. It'll probably take a few weeks to finish, since it requires significant refactoring of internals. But it's happening, and I am hopeful that the new approach will give significantly better performance, especially on large arrays.

In summary: I'm aware that performance is currently not great, but we have a plan for improving it, and I'm optimistic that big improvements are possible. I'd love to see more of your experiments with Automerge, and I'd love to hear more about the federated personal web site system you're planning to build.

aral commented 6 years ago

Hey @ept, thank you for the additional background – it’s very helpful.

As I understand it, GC is a hard problem in peer-to-peer systems as you cannot be certain when a node may come back online with operations that are casually-linked to garbage-collected history. I was just reading Alexei’s (@archagon) excellent round up of CRDTs and his work with Casual Trees and Operational Replicated Data Types (ORDTs) and the solutions he presents there, including for GC, are very exciting (http://archagon.net/blog/2018/03/24/data-laced-with-history/). Not sure if you’ve chatted but you probably should regardless of anything else :)

One of the features I want to implement on whatever CRDT algorithm/library I choose is to have every operation signed and to also include publickeys of authorised people within those signed messages in an attempt to guarantee the integrity of the document (as does DAT/hypercore, for example) as well as implement a decentralised authentication mechanism that doesn’t rely on privileged central servers. So the DAG approach of Casual Trees sounds perfect to me :)

ept commented 6 years ago

A few more thoughts on the performance measurements.

The times you give above are the total time taken for n insertions; however, in many applications, you don't often insert 10,000 items into a list in one go, but rather have the list grow gradually through user activity. So perhaps the average time per insertion would be a more relevant metric? And that is 1.3ms per operation for a 1,000-item list, and 4.3ms per operation for a 10,000-item list. To be clear, that's still not blazing fast, but it's a less scary headline figure than 43 seconds.

Secondly, your script does a comparison to regular JavaScript array.push(). As Automerge uses immutable data structures, it has to clone the list for every change, whereas array.push() mutates the list. To make this an apples-to-apples comparison, the script should also clone the array on every change, like this:

  let x = []
  let start = new Date()
  for (let i = 0; i < numInserts; i++) {
    x = x.slice() // clone the array
    x.push(i)
  }
  let duration = (new Date()) - start
  console.log(`Took: ${duration}ms.`)

On my laptop, that script takes 270ms for 10,000 items, while the equivalent changes in Automerge take 39.7 seconds. Thus, Automerge currently incurs roughly a 150x overhead over plain JavaScript data structures. That is the gap we need to try to close.

aral commented 6 years ago

@ept Thanks again – this is very helpful (& agree with all your points).

PS. Forgot to mention it, had just watched your talk at a conference in London when I replied earlier. Great talk :) So exciting to see work in this area progressing with pragmatic work in parallel projects :)

marbemac commented 6 years ago

One thing that we noticed, space wise, is that the auto-merge data structure (result of .save, etc), is VERY gzip friendly.

jadbox commented 4 years ago

Wondering if there's been improvements since 2018?

andrewluetgers commented 4 years ago

also interested in this, especially in regards to possibly refactoring internal data structures and possibly replacing immutable.js with something like immer.

PowerWeb5 commented 4 years ago

I'm likewise interested in any updates regarding optimization and benchmarks. Also, does anyone have any further insights into pros/cons, especially regarding performance, vs. Gun (https://github.com/amark/gun) in use for real-time collaborative DB editing?

ept commented 4 years ago

There have been some performance improvements (notably #177) but there is still more to do. I have been working on new data structures on the performance branch; that work has been paused since November as I've been busy with other things, but I'm about to resume work on it. Sorry that it's taking a while — it is quite a hard problem (and therefore also difficult to delegate)!

PowerWeb5 commented 4 years ago

Thanks so much for the update, @ept, I'm glad to hear of the progress in and further plans for optimization!

As to GunDB, it seems like a non-starter due to lack of support for compact/GC/prune, time travel and easy access to change history. And that despite all the overhead dedicated to preserving such a changelog. In that sense, GunDB seems more an alternative to Hypermerge with append-only Hypercore storage, than Automerge for general purpose use.

dumblob commented 3 years ago

I'd like to bring to attention what @aral mentioned above almost 3 years ago - namely the article about ORDTs (and Causal Trees) because there is a bunch of solutions to all the hard problems I see automerge is trying to tackle throughout the last years.

I've complemented e.g. the pruning (space & bandwidth) & reconstruction (speed) with another hybrid strategy I outlined in https://github.com/uwu-tech/Kind/issues/167#issuecomment-820320144 (compare it to https://github.com/automerge/automerge/pull/253 ).

@ept I very much recommend reading the article and looking at some example implementations of ORDTs (e.g. https://github.com/courajs/referent ).

dchester commented 2 years ago

We have used this wonderful library in many projects to great success (thank you!), but have also run into the performance constraints others have detailed in this issue.

In case anyone is interested, we have developed a new automerge-inspired library which takes a different underlying approach, and makes a whole lot of different trade-offs in order to optimize for performance of docs with many updates over time: https://github.com/frameable/pigeon

We have some comparison/benchmark numbers in the wiki. You can see that the improvement is dramatic for the case tested there.

Pigeon's API is mostly drop-in compatible with automerge. Some key differences (see README for more details):

only a trailing window of history is kept by default, which obviously comes with correctness/robustness trade-offs
changes to documents are computed by comparing old and new states, so very large documents can be slow, but...
there is an applyChangesInPlace method which can help a little bit if that is suitable in your situation
when you have an array of objects in your docs, pigeon wants your objects to have unique id properties of some sort

ept commented 2 years ago

Hi @dchester, thanks for posting about Pigeon, it looks interesting! Seems like a good idea to explore some alternative points in the trade-off space.

Have you tried the new WebAssembly implementation of Automerge? It's much faster than the JS implementation. The low level API is a bit different from the existing Automerge API, but it's available on npm. There's also a work-in-progress implementation of the current (high level) Automerge API on top of the Wasm API. I'd be interested to see what the comparison looks like when you swap in the Wasm version.

automerge / automerge-classic