MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

Updating `keyPathToKey` to escape key parts #22

Closed emmacasolin closed 2 years ago

emmacasolin commented 2 years ago

Description

There was a bug with the way that keypaths were encoded into keys, in that a keypath with no level part could be encoded to look like an encoded keypath that does have a level. This is because the key part was previously left as is during encoding.

For example, the key ['\0ABC\0DEF'] would be encoded as \0ABC\0DEF, however, this would be decoded as ['ABC', 'DEF']. So separators in the key part could be misinterpreted as level separators.

In order to fix this, we need to escape separators in the keypath as well as the level part. We were only doing this for the level part previously. Then, on the decoding side, we need to unescape the key part as well.

Issues Fixed

Tasks

Final checklist

emmacasolin commented 2 years ago

The key path parsing tests (including new edge cases) are all passing with the following changes:

However, some of the other tests are failing now. Specifically

I'll start looking into these now

emmacasolin commented 2 years ago

The first problem I'm noticing is actually a place where we're expecting that a key that looks like a level path is actually a level path.

When we're clearing a sublevel, everything that comes after the level path of the sublevel we want to clear is returned from iterating over that level path, which returns the remainder of the path as a key buffer that we then append to the level path to delete it

public async _clear(levelPath: LevelPath = []): Promise<void> {
  for await (const [k] of this._iterator({ values: false }, levelPath)) {
    await this._del([...levelPath, k]);
  }
}

The del then relies on the key being encoded as is, since in this case it's actually a level path.

public async _del(keyPath: KeyPath): Promise<void> {
  return this._db.del(utils.keyPathToKey(keyPath));
}

So the separators, which are meant to be treated as actual separators, are being escaped now.

The solution is probably just to parse the keypath returned from the iterator before appending it.

CMCDragonkai commented 2 years ago

I suggest that the problem is double escaping.

Since if we escape the terminal part of the keypath, then the fully escaped key stored in the DB.

Now during iteration it takes the fully escaped key, unescapes it all and splits it into parts.

It has to do this in order to "split" out the relevant level.

The remaining parts then is currently being put back into a key using keyPathToKey. However this causes double escaping subsequently, because later the resulting key is put into keyPathToKey function again.

So the solution can be to make the iterator not escape when calling keyPathToKey.

Ideally we want to return the KeyPath itself. This means an AbstractIterator type of AbstractIterator<KeyPath, Buffer> instead of AbstractIterator<Buffer, Buffer>. This would mean that subsequent usage like the _clear function:

  public async _clear(levelPath: LevelPath = []): Promise<void> {
    for await (const [k] of this._iterator({ values: false }, levelPath)) {
      await this._del([...levelPath, k]);
    }
  }

Would actually have to change to:

  public async _clear(levelPath: LevelPath = []): Promise<void> {
    for await (const [keyPath] of this._iterator({ values: false }, levelPath)) {
      await this._del([...levelPath, ...keyPath]);
    }
  }

But this would impact everything like EFS and PK, and it's not backwards compatible.

So if the alternative is to ensure that the k is not escaped.

Perhaps the keyPathToKey can have a boolean parameter that determines whether to escape or not. Then in the _iterator it will just not escape it.

          // Truncate level path so the returned key is relative to the level path
          const keyPath = utils.parseKey(kv[0]).slice(levelPath.length);
          // Don't escape the key, because it will be used as the `keyActual`
          kv[0] = utils.keyPathToKey(keyPath, false);
          if (options?.keyAsBuffer === false) {
            kv[0] = kv[0].toString('utf-8');
          }

There's a problem here though, the keyPathToKey will end up escaping the the level parts normally. That's fine, one could potentially not escape the level parts too.

But now what about the separator between the parts. Even if the level parts and final key part is not escaped, the separators would end up being escaped and then we're back to the original problem.

It seems no matter what the main problem is that iterator claims to return the raw key to the value which is true. But then users of the iterator expect that they can just treat it like a raw key which is not true because users of the iterator are forced to pass through the key encoding again.

It seems users of the system just has to deal with downstream changes. Either we have one of these 2:

  // here the iterator returns the raw key
  // and to use it must be parsed again with `parseKey`
  public async _clear(levelPath: LevelPath = []): Promise<void> {
    for await (const [k] of this._iterator({ values: false }, levelPath)) {
      await this._del([...levelPath, ...parseKey(k)]);
    }
  }

  // here the iterator has changed to return the keyPath
  // more efficient since it avoids needless re-encoding
  // changes API with leveldb though
  public async _clear(levelPath: LevelPath = []): Promise<void> {
    for await (const [keyPath] of this._iterator({ values: false }, levelPath)) {
      await this._del([...levelPath, ...keyPath]);
    }
  }

Note that the options of keyAsBuffer we haven't integrated yet as it is still in #19, perhaps that can be used to help.

To some extent we are already leaving levelDB API, so maybe that's what we will need to do.

CMCDragonkai commented 2 years ago

Subleveldown prevented this problem by not allowing separators in the level parts. Therefore the parsing of the key is simple, just whatever is left after the last separator.

I think this solution is preferable:

  // here the iterator has changed to return the keyPath
  // more efficient since it avoids needless re-encoding
  // changes API with leveldb though
  public async _clear(levelPath: LevelPath = []): Promise<void> {
    for await (const [keyPath] of this._iterator({ values: false }, levelPath)) {
      await this._del([...levelPath, ...keyPath]);
    }
  }

So this just means the iterator doesn't give you back raw keys, but instead key paths. If any user wants the raw key, they can call dbUtils.keyPathToKey.

CMCDragonkai commented 2 years ago

Let me see if I can roll this into the SI branch because I'm already changing the iterators supporting keyAsBuffer and valueAsBuffer, and let's see.

CMCDragonkai commented 2 years ago

This will be solved in #19, and cherry-picked into master or staging.