Decide on sort order - Githubissues

vweevers commented 5 years ago

@Level/core The way that memdown compares keys is starting to show its limitations. To fix that (or not) we have a few options. The goal is to increase ecosystem compatibility (like integrating with subleveldown), which doesn't necessarily mean "behave exactly like leveldown".

1. Keep memdown as-is.

Drawback: you can only safely use one key type in your db. When mixing types - e.g. db.put(str) followed by db.put(buf) - keys may unexpectedly be treated as equal. This issue exists with strings vs buffers, strings vs numbers and more.

Ref: https://github.com/Level/subleveldown/issues/64, https://github.com/Level/level-ttl/issues/68#issuecomment-480505112, https://github.com/Level/community/issues/58#issuecomment-487354041, https://github.com/Level/abstract-leveldown/pull/310 and possibly https://github.com/noffle/kappa-view-level/commit/fdf775de76abdffdcbbd73a2318de08a55bd0ec1.

2. Bring memdown closer to level-js, by sorting keys by type first, then content (#185)

Mixing types in your db is OK. The behavior does however differ from leveldown, where buffers and strings are both stored as byte arrays, while memdown will sort and store them separately. For example, db.put('a') will write to a different key than db.put(Buffer.from('a')).

3. Bring memdown closer to leveldown, by storing everything as a Buffer.

Drawbacks: conversion cost (especially when you're solely working with strings, which is common), reduced type support.

4. Bring memdown closer to leveldown, by comparing strings and buffers bytewise

Possibly combined with #185, I don't know what it would look like. Downside: this will be a third distinct behavior, sitting in between leveldown and level-js. Alternatively (and radically) we could drop support of key types other than strings and buffers.

5. Long-term: bring encodings into abstract-leveldown (https://github.com/Level/community/issues/58)

This could potentially allow implementations to be smarter about how they store and return types.

rvagg commented 5 years ago

is starting to show its limitations

Is this just from https://github.com/Level/subleveldown/issues/64 or has this been coming up a lot?

I like the straightforwardness of option 3, even if there is a bit of a cost. It's not a terribly huge cost but maybe benchmarks would help quantify if we have benchmarks we can throw at before & after.

As for "reduced type support." how much of a problem is this going to be? leveldown only supports strings and Buffers and abstract-leveldown only really says strings and Buffers are important for implementations. Have you seen downstream use cases where people aren't using strings or Buffers (or Buffer-like arrays)?

vweevers commented 5 years ago

Is this just from Level/subleveldown#64 or has this been coming up a lot?

Also level-ttl, abstract-leveldown (it can't consume its own data which hurts our ability to provide fallback behaviors for new features), see references tickets above.

As for "reduced type support." how much of a problem is this going to be? leveldown only supports strings and Buffers

True, it's not a problem if you use memdown solely as a fallback for leveldown. This is a limiting perspective though. Consider level-js, which supports bytewise-like keys out of the box without encodings, instead relying on native (fast) browser features. Option 2 above would make memdown suitable as a companion to level-js, for pure in-browser usage (while remaining suitable as a leveldown fallback). For example, a browser extension can persist its data to IDB with level-js and use memdown as a caching layer.

abstract-leveldown only really says strings and Buffers are important for implementations

What "important" means here is that only these types are supported across the board and tested in the abstract test suite. An implementation that supports other types is itself responsible for testing them.

Have you seen downstream use cases where people aren't using strings or Buffers (or Buffer-like arrays)?

Admittedly, apart from my own use, no. We've been trying to take memdown and level-js out of the shadow of leveldown, give them their own distinctive features, playing with alternative means of encoding and storing data, but AFAICT this hasn't taken off as I hoped it would.

vweevers commented 5 years ago

Another option: have both level-js and memdown behave like leveldown when it comes to strings and buffers (by internally converting strings to buffers, or an ArrayBuffer if that's faster in level-js). They can still support additional types on top of that.

I'm starting to like this option, because level-js and memdown having a different behavior than leveldown will surely lead to issues sooner or later. Those issues will be subtle and hard to pinpoint.

First, lemme get back to the level-bench benchmarks, because we only have write benchmarks atm. Will also have to figure out a way to run the benchmarks in the browser.

ralphtheninja commented 5 years ago

Another option: have both level-js and memdown behave like leveldown when it comes to strings and buffers (by internally converting strings to buffers, or an ArrayBuffer if that's faster in level-js). They can still support additional types on top of that.

This sounds almost like option 3?

I'd go for any option that solves real issues and I prefer solutions that simplify and remove confusion. It seems to me this cost is paid by conversion costs and that's fine if that cost isn't too high.

vweevers commented 5 years ago

This sounds almost like option 3?

Yeah, it's 3, but also applied to level-js, which kind of simplifies the problem space, because it's no longer about 3 different behaviors (memdown vs level-js vs leveldown), but 2.

vweevers commented 5 years ago

@peakji Any thoughts? Even if you only use leveldown (any perspective is welcome).

vweevers commented 5 years ago

I propose the following changes:

Drop key types other than strings and buffers from memdown and level-js. If someone wants that type support back, they can fork. Long-term we may bring back some of the functionality in another form, depending on what we do with encodings and the merger of abstract-leveldown with levelup.
In memdown, store strings as buffers. We'll only need one simple comparator. Prefer compatibility over raw performance. If someone doesn't like the conversion cost, they can fork (or go lower-level, i.e. using functional-red-black-tree directly).
In level-js, store strings as (array)buffers. Continue to use IndexedDB's native comparator.

The custom type support was a worthwhile experiment IMO that incidentally helped to clean up code overall. In absence of actual usage, we can end this experiment. This GH thread has been open long enough for people to raise objections.

rvagg commented 5 years ago

sgtm, you could even consider aiming for switching memdown (maybe even abstract-leveldown) to ArrayBuffers too, migrate as much as possible to UInt8Arrays for maximal compatibility and potentially even remove the need for Buffer browser polyfill.

vweevers commented 5 years ago

Later. The primary goal is to increase compatibility with the level ecosystem (which mostly deals with Buffers) so I want to stick with Buffers for now. The proposed changes will fix issues with subleveldown, level-ttl and abstract-leveldown. I don't want to attack a larger scope than that at once.

vweevers commented 5 years ago

Continuing at:

Level / memdown

Decide on sort order #186