automerge / automerge-classic

A JSON-like data structure (a CRDT) that can be modified concurrently by different users, and merged again automatically.
http://automerge.org/
MIT License
14.75k stars 466 forks source link

Improve performance of repeated calls to `Automerge.merge()` #368

Closed ept closed 3 years ago

ept commented 3 years ago

Previously, Automerge.merge(localDoc, remoteDoc) would get all changes from remoteDoc and apply them to localDoc, relying on the built-in change deduplication to ignore any changes already present in localDoc. This works but is quite slow. This PR changes merge to get only those changes in remoteDoc that are not already present in localDoc, and applies those.

On @josephg's benchmark in #360 this improves performance by a factor of 15 on my machine:

Note this PR introduces a new backend function getChangesAdded(), which will also need to be implemented in the Rust backend.

josephg commented 3 years ago

Thanks for working on this! Its weird - I'm seeing very different performance numbers from you:

So this helps, but there's still more than an order of magnitude slowdown compared to 0.14.2.

The only changes I made to the script were removing the console.time() / timeEnd() calls in the middle, and changing the require('automerge') call with require('.') when testing this branch.

$ node --version
v16.1.0   # (on linux, if that matters somehow)
$ grep version node_modules/automerge/package.json 
  "version": "1.0.1-preview.1",
$ node am.js
In 5 seconds ran 36 iterations
$ yarn add --dev automerge@0.14.2
...
Done in 0.11s.
$ grep version node_modules/automerge/package.json
  "version": "0.14.2",
$ node am.js                                
In 5 seconds ran 10228 iterations
[leaf] ~/3rdparty/automerge:main ✓ $ git checkout faster-merge
Branch 'faster-merge' set up to track remote branch 'faster-merge' from 'origin'.
Switched to a new branch 'faster-merge'
[leaf] ~/3rdparty/automerge:faster-merge ✓ $ yarn build
...
Done in 2.92s.
[leaf] ~/3rdparty/automerge:faster-merge ✓ $ git log -n1
commit 4e2f32e832b0a21310989e13901a7cb76c085b8e (HEAD -> faster-merge, origin/faster-merge)
[leaf] ~/3rdparty/automerge:faster-merge ✓ $ node am.js 
In 5 seconds ran 331 iterations
josephg commented 3 years ago

Aaah for some reason the performance drop I'm seeing is coming from something webpack is doing. If I make this change to package.json:

  "main": "src/automerge.js",

Then I see performance numbers similar to you!

[leaf] ~/3rdparty/automerge:faster-merge ✗ $ node am.js      
In 5 seconds ran 7374 iterations
josephg commented 3 years ago

Delving further, the performance returns if I remove babel from the webpack.config.js. It looks like one of the JS transforms / polyfills that babel is doing axes performance.

ept commented 3 years ago

Oh wow. Good find. When running with the webpack output (dist/automerge.js after running yarn build) it does indeed get massively slower. Flamegraph shows the code is spending almost all of its time in Decoder.readInt64(). I manually compared the webpack output for that function to the original code and did not find any significant differences. I guess that maybe Uint8Array is getting replaced by a slow polyfill?

How do you change webpack.config.js to remove Babel? I don't know how this stuff works…

skokenes commented 3 years ago

Seems like its coming from core-js https://github.com/zloirock/core-js#caveats-when-using-typed-arrays-polyfills

Definitely interested in this as our UI package that uses Automerge does go through its own bundling process with babel/core-js. Would love to know how we do that without accidentally hitting this typed array polyfill issue

josephg commented 3 years ago

The babel configuration is these lines in webpack.config.js:

  module: {
    rules: [
      {
        test: /\.js$/,
        exclude: /node_modules/,
        use: {
          loader: 'babel-loader',
          options: {
            presets: [
              ['@babel/preset-env', { targets: 'defaults' }],
            ]
          }
        }
      }
    ]
  }

This entry in the "rules" array is naming babel-loader, and telling it to transpile the javascript to work on a list of targets that browserlist considers reasonable "defaults". And it looks like that includes some super old web browsers which don't support typed arrays.

Personally I think its weird for an npm module to do this work. If people want to use automerge in their web app, they should just compile / transpile it themselves to support whatever list of web browsers they decide to support. A lot of work has gone into making this sort of thing work reasonably well through npm anyway.

Anyway, options, in decreasing severity:

  1. Remove webpack entirely. Just have package.json reference src/automerge.js. Automerge doesn't need a build step. Users who want compiled javascript can make their own.
  2. Remove babel from webpack. Just make the webpack config have an empty rules: [] array.
  3. Change the babel preset target from "defaults" to either "> 1%, not dead" or "maintained node versions" (which is my preference). The former says "allow features only if they're supported in every browser with more than 1% usage share". It looks like thats enough for typed arrays not to be polyfilled. The latter says "make sure it'll work on the oldest version of nodejs in the LTS window". (Which at the moment iirc is node 12+). Given we're publishing to npm, that makes lots of sense to me. Either of those settings seems to be good enough to remove the typed array polyfill.

If you want to learn more, here's all the values we can put in the targets: 'xxx' field: https://github.com/browserslist/browserslist#full-list

And here's documentation on how to set up babel/present-env: https://babeljs.io/docs/en/babel-preset-env .

skokenes commented 3 years ago

I agree with @josephg , I think AM could at least get away with dropping the @babel/preset-env preset all together as it should be the consumer's job to bundle for their target environment

josephg commented 3 years ago

Yeah I agree. But just to be clear, removing the @babel/preset-env line makes babel revert to its default transpile target. Which still unfortunately polyfills typed arrays.

If we want to leave JS target selection up to our users, we should either remove babel entirely (module: { rules: [] }) or remove the webpack build step entirely, and as I said above just reference src/automerge.js directly from package.json. The package works fine as it is.

ept commented 3 years ago

I think it's useful to keep the webpack config and the build step as it produces a self-contained JS file that can be used directly in browsers without further work. It's also used by the browser-based tests that are run as part of Automerge's CI. However, I'm happy for that JS file to only work in modern browsers, and if people want to target older browsers I think it would be fair to require them to perform their own transpiling. Thus, I think we can remove Babel (which also has the nice property of making the node_modules directory a bit smaller). To me it also seems fine to point package.json directly at the original source (I never quite understood why it should point at the webpack output). I have added commits to that effect to this PR.

If anybody disagrees on how this should be done, please speak up! I am totally clueless about JS packaging…