Level / abstract-leveldown

An abstract prototype matching the leveldown API.
MIT License
146 stars 53 forks source link

Serialization #277

Closed vweevers closed 6 years ago

vweevers commented 6 years ago

Closes #120, closes #130, closes #230. Checklist adapted from https://github.com/Level/abstract-leveldown/issues/120#issuecomment-390171073:

These are very tricky changes, will need canary testing in at least leveldown.

Externally (just mental notes):

vweevers commented 6 years ago

Should also allow empty range options (zero-length string or zero-length Buffer) in _setupIteratorOptions.

As @dominictarr wrote in https://github.com/Level/encoding-down/issues/42:

[abstract-leveldown now] strips out nullish ranges, which is reasonable for leveldown, but was supported in levelup, and null, undefined and '' are significant values in some encodings (bytewise, charwise).

Empty range options should be OK for leveldown, because IIRC it ignores such options in the C++.

vweevers commented 6 years ago

Problem: in encoding-down, range options must be serialized after encoding them.

vweevers commented 6 years ago

Problem: in encoding-down, range options must be serialized after encoding them.

No :) The order doesn't matter in encoding-down, as its _serializeKey is an identity function.

vweevers commented 6 years ago

@ralphtheninja WDYT about:

Note that before this PR (specifically, dec6278), keys that would likely serialize to an empty string were rejected:

https://github.com/Level/abstract-leveldown/blob/d7411bbef7c7b8bd5f144db24b35a6cbdcd918f6/abstract-leveldown.js#L247-L249

And this String(obj) call was kinda wasteful.

vweevers commented 6 years ago

I'm leaning towards a yes, because we gotta make sure we're not passing empty string keys to leveldown, which segfaults on empty slices, or at least it used to. @ralphtheninja is that still the case, or does the C++ guard against that nowadays?

ralphtheninja commented 6 years ago

is that still the case, or does the C++ guard against that nowadays?

Hmm, I would have to check.

vweevers commented 6 years ago

Might be moot. I'm thinking if leveldown doesn't support empty keys, other implementations shouldn't either for consistency. So it's something that abstract-leveldown should guard against.

ralphtheninja commented 6 years ago
var leveldown = require('.')                         
var db = leveldown('./db')                           
db.open(function (err) {                             
  db.put('', 'VALUE', function (err) {               
    console.log('after put', err)                    
    db.get('', function (err, value) {               
      console.log('after get', err, value.toString())
    })                                               
  })                                                 
})                                                   

If I comment out _checkKey in abstract-leveldown I get:

after put undefined
after get null VALUE
vweevers commented 6 years ago

@ralphtheninja huh, I did not expect that!

vweevers commented 6 years ago

Ah, the segfaults were not about empty keys, but empty range options: https://github.com/Level/levelup/issues/112.

Still, we never supported empty keys, AFAICS. The check was introduced here: https://github.com/Level/abstract-leveldown/commit/4015c7b8707783a8497bd8336ee8bbaf16f0e85d, moved to prototype here: https://github.com/Level/abstract-leveldown/commit/da0ef1310ab0d43edbbb9bfa0f425a2c8afc4af9, then stayed the same until today.

vweevers commented 6 years ago

We have three options, in pseudocode:

Oh, how I hate this.

vweevers commented 6 years ago

I made up my mind. I want to keep the current logic - serialize(check(key)) - because serialization is meant to convert a key to a type supported by the underlying storage. That can be any type, we shouldn't care. We check user input, yes, but once we pass data to be serialized, it's out of our hands.

What matters more, is how the data comes out of the storage - i.e. it shouldn't be nullish because of preexisting significance in iterators and streams. We don't have a runtime guard against that atm, or a good mechanism in place to support such a guard. Adding more input checks certainly doesn't help, and we can't write abstract tests for it because we can't make assumptions about serialization or type support. So... that's something to attack later (or not).

For now, rejecting nullish as input should be enough of a signal to consumers and implementors. In addition, I'd like to keep rejecting empty keys, simply because that's how it's always been.

The only gap left is that before this PR, abstract-leveldown rejected String(key) === ''. This is a real edge case though, because all JS types for which String(key) === '' are already rejected: strings, arrays and buffers. The only type that can pass through is an object with a custom toString() that returns ''.

vweevers commented 6 years ago

Tested against leveldown#next with one addition:

LevelDOWN.prototype._serializeKey =
LevelDOWN.prototype._serializeValue = function (datum) {
  return Buffer.isBuffer(datum) ? datum : String(datum)
}
vweevers commented 6 years ago

After this I think we should add next branches to memdown and level-js.

vweevers commented 6 years ago

After this I think we should add next branches to memdown and level-js.

And encoding-down (where we should be able to remove _setupIteratorOptions).

ralphtheninja commented 6 years ago

And encoding-down (where we should be able to remove _setupIteratorOptions).

Good. This will make @dominictarr happy!

vweevers commented 6 years ago

done-downsized

ralphtheninja commented 6 years ago

Was nice to finally get this merged!

ralphtheninja commented 6 years ago

HAHAHAHA there we have it. A perfect gif. I think we should use it in the release notes!