Level / abstract-level

Abstract class for a lexicographically sorted key-value database.
MIT License
128 stars 8 forks source link

Add hooks and deprecate `batch`, `put` & `del` events #45

Closed vweevers closed 2 years ago

vweevers commented 2 years ago

Adds postopen, prewrite and newsub hooks that allow userland "hook functions" to customize behavior of the database. See README for details. A quick example:

db.hooks.prewrite.add(function (op, batch) {
  if (op.type === 'put') {
    batch.add({
      type: 'put',
      key: op.value.foo,
      value: op.key,
      sublevel: fooIndex
    })
  }
})

More generally, this is a move towards "renewed modularity". Our ecosystem is old and many modules no longer work because they had no choice but to monkeypatch database methods, of which the signature has changed since then.

So in addition to hooks, this:

No breaking changes, yet. Using hooks means opting-in to new behaviors (like the new write event) and disables some old behaviors (like the deprecated events). Later on we can make those the default behavior, regardless of whether hooks are used.

TODO:

Closes https://github.com/Level/community/issues/44.

vweevers commented 2 years ago

Initial batch benchmarks (on memory-level) look good. If you're not using hooks or events, the hooks branch of abstract-level is faster than main. If you are using events, db.batch() with a write event listener is 3-4% slower than db.batch() with a batch event listener (on either branch). Which is fair; the write event has more data.

vweevers commented 2 years ago

db.put() performance is good too (after 75c75e2). The hooks branch is faster than the main branch if no events are used. As expected, it becomes slower when you use events or prehooks. In the table below, events.put=1 means the benchmark had one listener for the put event. Similarly, hooks.prewrite=1 means one prewrite hook function, and hooks.prewrite=100 means it did:

for (let i = 0; i < 100; i++) {
  db.hooks.prewrite.add(function () {})
}
$ level-bench plot put
benchmark put on memory-level@1.0.0 win32 x64
node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random

1  memory-level#hooks                      36588 ops/s ±8.51%  fastest
2  memory-level#main                       35837 ops/s ±8.12%   +1.70%
3  memory-level#hooks  hooks.prewrite=1    35286 ops/s ±6.51%   +1.75%
4  memory-level#main   events.put=1        35508 ops/s ±7.47%   +2.01%
5  memory-level#hooks  events.write=1      34692 ops/s ±6.83%   +3.69%
6  memory-level#hooks  hooks.prewrite=100  34302 ops/s ±8.28%   +6.05%
Plot (click to expand) ![put 1667342642043](https://user-images.githubusercontent.com/3055345/199357007-e95a417c-8532-4c1d-84e6-116a4b405503.png)
vweevers commented 2 years ago

In classic-level, adding a prewrite hook function has a bigger effect. Which is not a blocker for this PR but we may want to look into optimizing batches at some point.

$ level-bench plot put
benchmark put on classic-level@1.2.0 win32 x64
node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random

1  classic-level#main                     30548 ops/s ±7.59%  fastest
2  classic-level#hooks                    30424 ops/s ±7.32%   +0.16%
3  classic-level#hooks  hooks.prewrite=1  28070 ops/s ±7.55%   +8.08%
vweevers commented 2 years ago

There's one remaining issue to fix (or not). If you do:

const data = db.sublevel('data')
const users = data.sublevel('users')

data.on('write', function (ops) {
  const wrongKey = ops[0].key
})

data.batch().del('alice', { sublevel: users })

Then the wrongKey emitted by the data sublevel is !data!!users!alice rather than !users!alice. This is a result of how sublevels work in general and I don't yet have a solution.

vweevers commented 2 years ago

I have a solution and a PoC implementation, but it'll hurt performance for nested sublevels. Given users = db.sublevel('data').sublevel('users'), instead of users forwarding its operations directly to db, it'll forward to the data sublevel which in turn forwards to db. I.e. users.batch([]) calls data.batch([]) which calls db.batch([]).

I have to benchmark that and see what tweaks can be made, but even if performance is significantly worse (and I think it will be) it might be worth it. Because it benefits both events and hooks: users.batch([]) would trigger the prewrite hook of users, then of data, then of db. Same for the write event. So, no matter what kind of database you have (sublevel or not, nested or not) it works the same. Which should benefit modularity.

It would make this PR semver-major, for two reasons:

  1. The change in performance. We could add support of db.sublevel(['data', 'users']) to give users the ability to negate it.
  2. We'd no longer support passing a sublevel option that isn't a descendant. So given a = db.sublevel('a') and b = db.sublevel('b') you can no longer do b.batch().del('1', { sublevel: a }).

In which case, I might just remove the batch, put and del events rather than deprecating them.

vweevers commented 2 years ago

@juliangruber @ralphtheninja any objections? The batch, put and del events are 10 years old, so I don't want to take removing them lightly.

vweevers commented 2 years ago

I have to benchmark that

Results for db.put() on memory-level, comparing no sublevel, 1 sublevel (!foo!), 2 sublevels (!foo!!bar!) and more:

Click to expand ``` $ level-bench plot put benchmark put on memory-level@1.0.0 win32 x64 node@16.9.1 n=1M concurrency=4 valueSize=100B keys=random values=random 1 memory-level#hooks 35781 ops/s ±7.11% fastest 2 memory-level#main 35794 ops/s ±8.69% +1.43% 3 memory-level#hooks !foo! 30701 ops/s ±7.04% +14.15% 4 memory-level#main !foo! 30183 ops/s ±5.55% +14.40% 5 memory-level#main !foo!!bar! 29386 ops/s ±5.66% +16.75% 6 memory-level#hooks !foo!!bar! 28865 ops/s ±5.07% +17.76% 7 memory-level#main !foo!!bar!!baz! 28813 ops/s ±5.46% +18.22% 8 memory-level#main !foo!!bar!!baz!!bam! 28720 ops/s ±5.48% +18.49% 9 memory-level#main !foo!!bar!!baz!!bam!!boo! 27970 ops/s ±5.96% +20.98% 10 memory-level#hooks !foo!!bar!!baz! 27796 ops/s ±5.59% +21.20% 11 memory-level#hooks !foo!!bar!!baz!!bam! 26067 ops/s ±6.95% +27.04% 12 memory-level#hooks !foo!!bar!!baz!!bam!!boo! 25255 ops/s ±5.33% +28.22% ```

At a depth of 2 sublevels, the difference between main and hooks is negligible. But it gets progressively worse the deeper you go. That's partially explained by having to copy longer prefixes, but main has a more consistent performance between sublevel depths.

With support of db.sublevel(['foo', 'bar']) (marked by flat below) we can recover:

Click to expand ``` 1 memory-level#hooks 35781 ops/s ±7.11% fastest 2 memory-level#main 35794 ops/s ±8.69% +1.43% 3 memory-level#hooks !foo! 30701 ops/s ±7.04% +14.15% 4 memory-level#main !foo! 30183 ops/s ±5.55% +14.40% 5 memory-level#main !foo!!bar! 29386 ops/s ±5.66% +16.75% 6 memory-level#hooks !foo!!bar! 28865 ops/s ±5.07% +17.76% 7 memory-level#main !foo!!bar!!baz! 28813 ops/s ±5.46% +18.22% 8 memory-level#hooks !foo!!bar!!baz!!bam!!boo! flat 29008 ops/s ±6.28% +18.30% 9 memory-level#main !foo!!bar!!baz!!bam! 28720 ops/s ±5.48% +18.49% 10 memory-level#main !foo!!bar!!baz!!bam!!boo! 27970 ops/s ±5.96% +20.98% 11 memory-level#hooks !foo!!bar!!baz! 27796 ops/s ±5.59% +21.20% 12 memory-level#hooks !foo!!bar!!baz!!bam! 26067 ops/s ±6.95% +27.04% 13 memory-level#hooks !foo!!bar!!baz!!bam!!boo! 25255 ops/s ±5.33% +28.22% ```
vweevers commented 2 years ago

I've created a v2 branch as new base for this PR. Allows me to move ahead with items of https://github.com/Level/abstract-level/issues/47.

juliangruber commented 2 years ago

Sorry @vweevers, I don't have time to review this ATM :|

vweevers commented 2 years ago

OK! Thanks for letting me know. FWIW I'll probably mark the hooks API as experimental (before v2 goes out the door) so there will be room for changes.