MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

Upgrade to abstract-level interface #11

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Specification

The abstract-level https://github.com/Level/abstract-level is a new foundation of LevelDB. Upgrading to this will help us resolve all other issues in DB.

5 still cannot be solved until IDB becomes available, then we can use browser-level or level-js and have the transparent encryption be done at the block level. Or done at the C++ level inside leveldb, or rocksdb.

Wait until level has released version 8, although this can be started now, since we may not need to use the level package at all. The only dependencies of js-db may be abstract-level and classic-level.

Additional context

Tasks

  1. ...
  2. ...
  3. ...
CMCDragonkai commented 2 years ago

I want to change the API interface to:

get(keys: string | Buffer | Array<string | Buffer>)

So that way get('key') or get(['level1', 'key']) can work.

And a way to create deeply nested levels more easily: https://github.com/Level/subleveldown/issues/112

CMCDragonkai commented 2 years ago

We should remove any usage and management of sublevels. Unless the sublevels themselves are just DB objects. But mostly it should be possible to just directly use the DB object instance and use key paths.

The APIs between DB and DBTransaction is still different because we want to eventually move to where you can just specify a key path instead of directly maintaining particular sublevel objects. Once all methods can just take a full keypath, there's no need to manage a bunch of sublevel objects, and that should reduce the object maintenance overhead.

// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);

DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);

DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`

DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();

DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');

DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);
CMCDragonkai commented 2 years ago

We will need to test and benchmark sublevel creation multiple times to see how fast we can do this though, because somethings like iterator and count will require creating those sublevels.

CMCDragonkai commented 2 years ago

Some benchmarking results show that naively creating sublevels in order to support the key paths API or the domain APIs is quite slow:

  create 1 sublevels:
    87 320 ops/s, ±0.79%   | fastest

  create 2 sublevels:
    45 325 ops/s, ±0.55%   | 48.09% slower

  create 3 sublevels:
    30 294 ops/s, ±0.39%   | 65.31% slower

  create 4 sublevels:
    22 764 ops/s, ±2.34%   | slowest, 73.93% slower

  get via sublevel:
    25 055 ops/s, ±1.36%   | 71.31% slower

  get via key path concatenation:
    54 992 ops/s, ±2.68%   | 37.02% slower

Therefore it appears that if we want to have an efficient implementation, instead of just creating intermediate sublevel objects, we instead should be doing key path/prefix concatenation.

Now this is what we do when we use get and put and del, however the methods like level and count and dump and iterator are not, and are just creating sublevel objects.

For level, count, dump and iterator, we fundamentally need subleveldown to support taking the concatenated prefix, or an array type as the prefix in https://github.com/Level/subleveldown/issues/112#issuecomment-1075904007. But this is prevented by subleveldown's RangeError that is thrown https://github.com/Level/subleveldown/issues/111#issuecomment-1070587202.

Basically it seems like db.sublevel(['foo', bar']) would be implemented efficiently as db.sublevel('foo!bar') so that the key path is just concatenated rather than creating intermediate sublevel objects.

Our in our case:

subleveldown(dbLevel, `foo${utils.prefix}bar`)
// or
subleveldown(dbLevel, ['foo', 'bar'])
CMCDragonkai commented 2 years ago

I suggest that we remove subleveldown entirely, and just allow key paths for both iterators and get, put, del and clear... etc. Most custom operations like dump, clear, count all rely on the iterator. And we don't need subleveldown if we run our iterator based on a specific key path for gt and lt`.

I reckon it would be something like this:

iterator({
  // gt means greater than this key
  // since !A!B! would be the domain path, any key added would always be 1 character more
  gt: '!A!B!',
  // this would have to be the very next byte number above A!B, but that which doesn't produce an extra character, we could create a buffer and just increment by 1 the last byte
  // if `!` is the separator which is 0x21, then plus 1 would be `"` which is 0x22
  lt: '!A!B"'
})

Then we probably solve a number of problems.

CMCDragonkai commented 2 years ago

This script proves we can just use gt and lt to get a proper iteration:

import DB from './src/DB';

async function main () {
  const db = await DB.createDB({
    dbPath: './tmp/db',
    fresh: true
  });

  const l1 = await db.level('level1');
  await db.put(['level1'], 'k', 'v');
  await db.put(['level1'], 'a', 'v');
  await db.put(['level1'], 'c', 'v');

  await db.level('level2');
  await db.put(['level2'], 'k', 'v');
  await db.put(['level2'], 'a', 'v');
  await db.put(['level2'], 'c', 'v');

  await db.put([], '!level1"', 'v');
  await db.put([], '!level1 ', 'v');

  console.log('USING SUBLEVELDOWN');
  for await (const k of l1.createKeyStream()) {
    console.log(k.toString());
  }

  console.log('ITERATOR');
  const i = db.dataDb.iterator({
    gt: '!level1!',
    lt: '!level1"'
  });
  // @ts-ignore
  for await (const [k, v] of i) {
    console.log(k.toString());
  }

  console.log('KEY STREAM');
  for await (const k of db.dataDb.createKeyStream({
    gt: '!level1!',
    lt: '!level1"'
  })) {
    console.log(k.toString());
  }

  console.log('DUMP');
  console.log(await db.dump());

  await db.stop();
}

main();

I just realised that dump doesn't preserve order, and that it assumes keys can just be strings. We can change this to an array of tuples, where the first tuple is always a buffer, or a binary string, and the value is decrypted as usual. If I want to use binary string I can use Buffer.toString('binary'), not this is the same as if I used String.fromCharCode(...key), but I used that in js-id due to Uint8Array.

The raw === true can be used to return buffers for both.

CMCDragonkai commented 2 years ago

With this prototype, I might just go ahead with getting rid of subleveldown entirely, and then that would make this 3.0.0 as the APIs are again different. Then that would affect PK quite significantly.

CMCDragonkai commented 2 years ago

Even without using subleveldown, the DB can still use abstract-level, and we sort of rely less on what the underlying system can do.

CMCDragonkai commented 2 years ago

I missed one thing, sublevels are actually indexed like this:

!A!!B!k

So between sublevels we have !! not just a single !.

CMCDragonkai commented 2 years ago

Due to #12, the 3.0.0 DB has been released and the interface shouldn't change all that much even with upgrade to abstract-level.

CMCDragonkai commented 2 years ago

https://github.com/MatrixAI/js-db/pull/13 has enabled the ability to arbitrarily use level parts without worrying about reserved bytes.