Benchmark against level

vweevers commented 3 years ago

Compare:

[x] level@7 against classic-level. To test the native part (now using encoding options instead of *asBuffer). Not expecting a change here.
[x] level-mem + subleveldown against memory-level + sublevels, with json encoding. To test the JS part. Expecting a slight improvement here, though it might only surface in real-world apps (with GC pressure) rather than a synthetic benchmark.
[x] Batching on level-mem against memory-level, as it removes two (or three, with sublevels) iterations of the batch array.

A quick benchmark is enough (of reads and writes). It's just to check that performance is equal or better.

vweevers commented 2 years ago

puts on level-mem + subleveldown versus memory-level + sublevels, with json encoding. Win.

put 1642345597518

puts on level-mem versus memory-level versus memory-level using strings internally. Double win.

put 1642346203027

vweevers commented 2 years ago

iterator.next() on level-mem versus memory-level, using json and utf8 valueEncodings. No difference (because the main cost is setImmediate).

iterate 1642346920934

vweevers commented 2 years ago

iterator.next() on level-mem versus iterator.nextv(1000) on memory-level. Not a fair benchmark, but the new nextv() API is an obvious win.

iterate 1642347677129

vweevers commented 2 years ago

iterator.next() on level versus iterator.next() on classic-level. Slower. I reckon that's because I changed the structure of the cache (in short: [entry, entry, ..] instead of [key, value, key, value, ..]) which should make nextv() faster. That'll be difficult to compare fairly.

iterate 1642373843003

vweevers commented 2 years ago

Batch puts on level-mem versus memory-level. Win.

batch-put 1643394415492

vweevers commented 2 years ago

Gets on level-mem versus memory-level. Win.

get 1643397969100

However, memory-level is slower when using a binary valueEncoding. That warrants a closer look.

vweevers commented 2 years ago

However, memory-level is slower when using a binary valueEncoding. That warrants a closer look.

It's not due to binary. Happens on any encoding when this code path is triggered:

https://github.com/Level/abstract-level/blob/d711af39de3126ee984d57df396a6d084ebfb748/abstract-level.js#L299-L301

V8 has a performance issue with the spread operator when properties are not present. The following "fixes" it:

options.keyEncoding = keyFormat
options.valueEncoding = valueFormat
options = { ...options, keyEncoding: keyFormat, valueEncoding: valueFormat }

As does using Object.assign() instead of spread:

get 1643406383068

Could switch to Object.assign() but I do still generally prefer the spread operator, for being idiomatic (not being vulnerable to prototype pollution could be another argument but I don't see how that would matter here).

vweevers commented 2 years ago

The same get() performance regression exists on classic-level. Using Object.assign() would fix it.

get 1643410282728

vweevers commented 2 years ago

Quick-and-dirty benchmark of streams, comparing nextv() to next(). Ref https://github.com/Level/community/issues/70 and https://github.com/Level/read-stream/pull/2.

Unrelated to abstract-level, but it's a win.

classic-level | using nextv() | took 1775 ms, 563380 ops/sec
classic-level | using nextv() | took 1577 ms, 634115 ops/sec
classic-level | using nextv() | took 1549 ms, 645578 ops/sec
classic-level | using nextv() | took 1480 ms, 675676 ops/sec
classic-level | using nextv() | took 1572 ms, 636132 ops/sec
                                 avg 1591 ms

level         | using next()  | took 1766 ms, 566251 ops/sec
level         | using next()  | took 1776 ms, 563063 ops/sec
level         | using next()  | took 1737 ms, 575705 ops/sec
level         | using next()  | took 1711 ms, 584454 ops/sec
level         | using next()  | took 1729 ms, 578369 ops/sec
                                 avg 1744 ms

vweevers commented 2 years ago

Did a better benchmark of streams. This one takes some explaining. In the graph legend below:

new-nextv is using a level-read-stream on a classic-level iterator using nextv(size)
old-next is level().createReadStream(), i.e. level-iterator-stream on a leveldown iterator using next()
new-next is using a level-read-stream on a classic-level iterator using next() (a temporary code path for fair benchmarking)
new-nextv-tweaked uses userland options to make sure that the byte-hwm is more than the expected byte size of a nextv(size) array, because otherwise classic-level would hit the byte-hwm first and return partially filled arrays
old-next-tweaked uses userland options to increase both byte-hwm and stream-hwm (manually creating a level-iterator-stream so that there's a way to specify both byte-hwm and stream-hwm). In such a way that the byte-hwm is effectively ignored and we can compare the effect of merely increasing stream-hwm.

Where "byte-hwm" is the highWaterMark on the C++ side, measured in bytes. And "stream-hwm" is the highWaterMark of object-mode streams, measured in amount of entries.

That's about half of the explainer needed... In hindsight I wish I didn't do the abstract-level and nextv() work in parallel. So please allow me to just skip to conclusions (and later document how a user should tweak their options):

nextv(size) is faster than next() (compare new-nextv to old-next)
Though the refactorings needed to implement nextv(size) make next() slightly slower (compare old-next to new-next)
Both the old next() and the new nextv(size) can be tweaked through options, but nextv(size) can't be beaten.
Bottom line, streams became faster. I won't give any numbers, because there are too many factors.

TLDR: we're good. Most importantly, the performance characteristics of streams and iterators did not change, in the sense that an app using smaller or larger values (I used 100 bytes) would be hurt by upgrading to abstract-level or classic-level. That's because leveldown internally already had two highWaterMark mechanisms; classic-level merely "hoists" one of them up to streams. So if an app has extremely large values, we will not prefetch more items than before. If an app has small values, we will not prefetch less than before. If an app is not using streams, iterators still do prefetch (as you can see later when I finally push all code).

stream 1643555843619

vweevers commented 2 years ago

and later document how a user should tweak their options

Done in https://github.com/Level/classic-level/pull/1

Level / abstract-level

Benchmark against level #4