MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

Subleveldown is now unnecessary, and it can be replaced with keypaths #12

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Description

By getting rid of subleveldown, the API of DB changes quite significantly but also much more simpler:

// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);

DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);

DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`

DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();

DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');

DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);

Firstly we won't have to deal with creating sublevels in an inefficient way: https://github.com/MatrixAI/js-db/issues/11#issuecomment-1075905887. Instead we just do everything through key concatenation.

We still need a separator which we can preserve as ! as the default.

For iteration, we would need to use https://github.com/MatrixAI/js-db/issues/11#issuecomment-1079592712

iterator({
  // gt means greater than this key
  // since !A!!B! would be the domain path, any key added would always be 1 character more
  gt: '!A!!B!',
  // this would have to be the very next byte number above !A!!B!, but that which doesn't produce an extra character, we could create a buffer and just increment by 1 for the the last byte
  // if `!` is the separator which is 0x21, then plus 1 would be `"` which is 0x22
  lt: '!A!!B"'
})

This also means all downstream packages that use DB will only just use the DB instance, they don't bother creating sublevel instances for no reason. There's no need to keep around sublevel instances anymore. This also reduces one headache which is that you have to "recreate" the sublevels whenever you stop the underlying database.

See how in our destroy methods such as SessionManager it requires:

  public async destroy() {
    this.logger.info(`Destroying ${this.constructor.name}`);
    const sessionsDb = await this.db.level(this.sessionsDbDomain);
    await sessionsDb.clear();
    this.logger.info(`Destroyed ${this.constructor.name}`);
  }

That's just unnecessary now.

You can just do:

  public async destroy() {
    this.logger.info(`Destroying ${this.constructor.name}`);
    await db.clear([this.sessionsDbDomain]);
    this.logger.info(`Destroyed ${this.constructor.name}`);
  }

This will probably mean a change to how we do our db domains, we just need to keep track of the levels we are operating on and that's it.

Issues Fixed

Tasks

Final checklist

CMCDragonkai commented 2 years ago

The domainPath function differentiates keys from paths.

When using the domain path, we now have to consider what is the user trying to do.

// this means sublevel of A then B
await db.iterator(options, ['A', 'B']);
// does this mean sublevel of A, or just A the key?
// it has to be sublevel of A obviously, since if the key is the only thing, then use the options parameter
await db.iterator(options, ['A']);

// does this mean get A the key?
// or does it mean get A the sublevel?
// it would have to mean A the key, since you cannot get a sublevel
await db.get(['A']);

await db.get(['A', 'B']);

This means we should differentiate DomainPath from KeyPath.

A KeyPath means that the last element is always the final key.

A DomainPath means that the last element is still a sublevel.

CMCDragonkai commented 2 years ago

Therefore:

get(keyPath: KeyPath)
put(keyPath: KeyPath)
del(keyPath: KeyPath)

iterator(options, domainPath: DomainPath)
count(domainPath: DomainPath)
clear(options, domainPath: DomainPath)
dump(domainPath: DomainPath)

We can get rid of batch since that's done by Transaction now.

CMCDragonkai commented 2 years ago

Might prefer LevelPath over DomainPath as it is more precise, and specific to leveldb, but even other ordered key values can match the same idea.

CMCDragonkai commented 2 years ago

Additionally we can say LevelPath can be empty to indicate the root level, while KeyPath has to be filled as you must refer to a single key at the very least.

CMCDragonkai commented 2 years ago

We now have:

keyPathToKey: KeyPath => Buffer
levelPathTokey: LevelPath => Buffer
parseKey: Buffer => KeyPath

This allows us to get rid of sublevel-prefixer, and we have a much more easier to understand code here.

CMCDragonkai commented 2 years ago

The default separator is changed 0x00 which means the 1 byte above is 0x01. These are unprintable characters in ASCII, so when key path are dumped using binary encoding, we will see \x00 and \x01. This frees up the!` the likelihood of clashes becomes less likely for string sublevels.

If we were really intending to allow sublevels to be arbitrary buffers, we would also need to do escaping.

But this complicates things in DB iteration, and possibly one was to use order-preserving base encoding on all sublevels to avoid that one character of 0x00.

For now buffers can be used in level paths, but cannot be arbitrary buffers, or at least cannot contain the separator buffer of 0x00.

CMCDragonkai commented 2 years ago

In the process of reworking the iterator, I realised I had a bug in the the current master where the keys and values properties were typed as key and value for iterator method.

CMCDragonkai commented 2 years ago

This test is now no longer relevant and is removed:

  test('async start and stop requires recreation of db levels', async () => {
    const dbPath = `${dataDir}/db`;
    const db = await DB.createDB({ dbPath, logger });
    await db.start();
    let level1 = await db.level('level1');
    await db.put(['level1'], 'key', 'value');
    await db.stop();
    await db.start();
    // The `level1` has to be recreated after `await db.stop()`
    await expect(db.level('level2', level1)).rejects.toThrow(
      /Inner database is not open/,
    );
    level1 = await db.level('level1');
    await db.level('level2', level1);
    expect(await db.get(['level1'], 'key')).toBe('value');
    await db.stop();
  });
CMCDragonkai commented 2 years ago

We will need https://www.typescriptlang.org/tsconfig/#stripInternal as I now have "internal methods" that are shared between DB and DBTransaction, but lacking "friend" classes, these are public methods but they will be stripped.

https://stackoverflow.com/questions/45847399/friend-class-in-typescript

CMCDragonkai commented 2 years ago

This is ready to be merged now. All tests pass, and benchmarks have been re-ran.

The dependencies have been significantly pruned here.