Enable the usage of `withF` and `withG` context for DB transactions and provide transactional iterators

CMCDragonkai commented 2 years ago

Description

The withF and withG generalises resource acquisition and release. The DB transaction is one of those resources. Since we share a DB being used in PK, having these methods would be useful.

Long term, the withF and withG types and utility functions would be extracted into its own package like js-context so it can be shared.

Over time this should replace the DB.transact and DB.withLocks methods, and instead just provide the DB.transaction method.

The transactional iterator implementation follows the iterator on LevelDB including the exception behaviour like in leveldown and abstract-leveldown packages. This is a stop-gap solution until we migrate the whole DB implementation over abstract-level #11.

Downstream packages EFS and PK will need to be updated.

Issues Fixed

Related to https://github.com/MatrixAI/js-polykey/issues/294

Tasks

[x] 1. Use resource control functions for transactions (withF and withG) by integrating ResourceAcquire and ResourceRelease types into a DB.transaction method.

[x] 2. Add an iterator method into Transaction and DBTransaction interface so that we can perform transactional iteration. This will still use the AbstractIterator type, which is not currently up to date in upstream which will necessitate the usage of @ts-ignore comments. See: https://github.com/DefinitelyTyped/DefinitelyTyped/discussions/59265

This iterator must act like a "overlay", in that writes in the transaction take priority over data that is currently in the DB.

KEYS	DB	SNAPSHOT	RESULT
a	a = a	a = 1	a = 1
b	b = b		b = b
c		c = 3	c = 3
d	d = d		d = d
e	e = e	e = 5	e = 5
f		f = 6	f = 6
g
h	h = h		h = h
i
j		j = 10	j = 10
k	k = k	k = 11	k = 11

[x] 3. Remove the methods DB.withLocks and DB.transact as this uses locking code, and locks are not necessary for our read-committed isolation level transactions, advisory-locks for transactions are now up to the user to manage.
[x] 4. Create root sublevels such as data and transactions and eventually index to support temporary state, so that users of DB will only be storing data under data sublevel.
- This means that db.db shouldn't be used anymore, as the root of data is now in db.dataDb, the db.db is the still the underlying root LevelDB database, but this shouldn't be used unless you're doing something hacky on the LevelDB database.
[x] 5. Move withF and withG and ResourceAcquire and ResourceRelease types into @matrixai/resources package, which will be used by PK as well and subsequently EFS.
[x] 6. Add Transaction.test.ts to formally test the Transaction class, and formalise the 4 isolation-levels vs read-phenomena properties into the tests.
[x] 5. Removed DBTransaction interface, and renamed Transaction class to DBTransaction, this just simplifies the types
[x] 6. Removed AbstractIterator in favour of our own DBIterator since the upstream type is out of date, and this avoids having users having to do // @ts-ignore all the time.

Final checklist

[x] Domain specific tests
[x] Full tests
[x] Updated inline-comment documentation
[x] Lint fixed
[x] Squash and rebased
[x] Sanity check the final build

CMCDragonkai commented 2 years ago

In order to allow some streaming based functionality, we have to allow the creation of iterators inside the transaction.

We will preserve the read-committed isolation level, which means we create iterators against the DB inside the transaction, and each iterator then needs to operate against the snapshot and the real DB.

This means our snapshot needs to become an ordered key value structure.

This means we are creating another temporary DB to be the snapshot. However I'm going to first just try with a sublevel.

This would mean that there is a reserved area in the DB where transactions are working against, and this sublevel should never be used by any other places.

CMCDragonkai commented 2 years ago

The usage of the snapshot sublevel does impact any usage of streaming on the root DB though, because using would iterate over the sublevel snapshots too.

Furthermore we cannot just use snapshot sublevel, we would need to create a new sublevel for each snapshot that is bound to a transaction.

Alternatively we forego using a snapshot level entirely and just use an in-memory ordered map structure, but that is its own can of worms.

Most efficient would be:

Snapshots sublevel with snapshot sublevels keyed to unique transaction IDs
This enables later persistent snapshots too, though not sure if this would be useful. At the very least, snapshot can grow beyond memory size?
This would need the root DB to create a sublevel to actually be used, which is to say, the root db level woudn't be the real root db, but instead something under the root database. We could have data and snapshots as the 2 root sublevels.
Would this impact existing users? I don't think so, it should be fine.

CMCDragonkai commented 2 years ago

These additional "root" sublevels can be extended to deal with automatic indexing #1 which would then supersede #2. Which would give us:

!data
!snapshots
!index

Then index sublevels can be accessed separately and managed separately.

Additional relevant issue https://github.com/Level/subleveldown/issues/111

Also I noticed that a abstract-level is being created and that may evolve the leveldb system to be more sophisticated. In particular supporting Uint8Array https://github.com/Level/abstract-level

CMCDragonkai commented 2 years ago

The https://github.com/Level/abstract-level might be a good foundation for a new DB, which would then still use leveldown "now known as classic-level" as the node underlying database.

CMCDragonkai commented 2 years ago

I've rolled up a bunch of issues in js-db into larger epic https://github.com/MatrixAI/js-db/issues/11 that would be one big change to DB, that would be breaking relative to EFS and PK. So this PR will focus on just integrating withF and withG and the ability to use iterator inside a transaction. However this may be removing the transact and withLocks method which does break EFS, but that is an easy fix, since EFS doesn't actually use any of the DB locking, so it can just have withF integrated with the resource types.

CMCDragonkai commented 2 years ago

The DBTransaction now uses transactionDb instead of snapshot. The tran.iterator(['d1', 'd2']) can take DBDomain, this makes our iterator method a bit different from leveldb's iterator method. This is because our tran methods tend to operate from the root, and that's how the get, and put works. Remember that since levels are not using the DB interface, we had to do this.

Furthermore the transactionDb is encrypted because it is on disk.

The iterator method is a bit strange, I need to test what happens when the keys are different between transactionDb and the dataDb it is operating over... And the usage of seek. Further prototyping required.

CMCDragonkai commented 2 years ago

The changes to DB has impacted some tests.

    ✓ async construction constructs the filesystem state (13 ms)
    ✓ async destruction removes filesystem state (3 ms)
    ✓ async start and stop preserves state (18 ms)
    ✓ async start and stop preserves state without crypto (4 ms)
    ✓ async start and stop requires recreation of db levels (6 ms)
    ✓ creating fresh db (5 ms)
    ✓ get and put and del (11 ms)
    ✓ batch put and del (13 ms)
    ✕ db levels are leveldbs (6 ms)
    ✓ db levels are just ephemeral abstractions (4 ms)
    ✓ db levels are facilitated by key prefixes (12 ms)
    ✓ clearing a db level clears all sublevels (10 ms)
    ✕ lexicographic iteration order (9 ms)
    ✕ lexicographic buffer iteration order (112 ms)
    ✕ lexicographic integer iteration (7 ms)
    ✓ db level lexicographic iteration (10 ms)
    ✓ get and put and del on string and buffer keys (6 ms)
    ✕ streams can be consumed with promises (11 ms)
    ✓ counting sublevels (15 ms)
    ✓ parallelized get and put and del (428 ms)
    ✓ parallelized batch put and del (464 ms)
    ✓ works without crypto (6 ms)

I'm creating Transaction.test.ts now to test different usages of transaction and prototype the iterator usage.

CMCDragonkai commented 2 years ago

I have a working prototype for the tran.iterator.

It is able to handle this sort of data:

KEYS	DB	SNAPSHOT	RESULT
a	a = a	a = 1	a = 1
b	b = b		b = b
c		c = 3	c = 3
d	d = d		d = d
e	e = e	e = 5	e = 5
f		f = 6	f = 6
g
h	h = h		h = h
i
j		j = 10	j = 10
k	k = k	k = 11	k = 11

CMCDragonkai commented 2 years ago

Tests have been done for the transactional iterator.

Now it's time to clean up the tests for the DB.test.ts, and we can start to merge this and apply it to the NodeGraph.

CMCDragonkai commented 2 years ago

In this PR, we are removing the withLocks and other async-mutex usages.

Having locks in the DB muddles the separation of responsibilities.

It turns out our read-committed transactions doesn't actually need locking at all.

You only need locks if you want to also prevent:

non-repeatable reads
phantom reads
lost-updates

However this all depends on the end-user and their specific situation when using transactions. Sometimes they may use only simple mutexes, in other cases they need to use read/write locks, OCC, PCC... etc.

So therefore locking functionality is removed from the DB.

We are however using withF and withG here. And we may import this from: https://github.com/MatrixAI/js-resources

CMCDragonkai commented 2 years ago

We now have tests for each isolation property:

[nix-shell:~/Projects/js-db]$ npm test -- ./tests/Transaction.test.ts 

> @matrixai/db@1.2.1 test /home/cmcdragonkai/Projects/js-db
> jest "./tests/Transaction.test.ts"

Determining test suites to run...
GLOBAL SETUP
 PASS  tests/Transaction.test.ts
  Transaction
    ✓ snapshot state is cleared after releasing transactions (34 ms)
    ✓ get, put and del (17 ms)
    ✓ no dirty reads (17 ms)
    ✓ non-repeatable reads (9 ms)
    ✓ phantom reads (11 ms)
    ✓ lost updates (7 ms)
    ✓ iterator with same largest key (16 ms)
    ✓ iterator with same largest key in reverse (15 ms)
    ✓ iterator with snapshot largest key (12 ms)
    ✓ iterator with snapshot largest key in reverse (12 ms)
    ✓ iterator with db largest key (12 ms)
    ✓ iterator with db largest key in reverse (10 ms)
    ✓ iterator with undefined values (20 ms)
    ✓ iterator using seek and next (19 ms)
    ✓ queue success hooks (5 ms)
    ✓ queue failure hooks (7 ms)
    ✓ rollback on error (7 ms)

Test Suites: 1 passed, 1 total
Tests:       17 passed, 17 total
Snapshots:   0 total
Time:        1.003 s, estimated 3 s
Ran all test suites matching /.\/tests\/Transaction.test.ts/i.
GLOBAL TEARDOWN

And matches this table:

CMCDragonkai commented 2 years ago

The db.db should no longer be used when attempting to access the internal DB at root.

You can use db.dataDb.

However that shouldn't be necessary either.

We now expose:

DB.clear() - use this to clear the dataDb
DB.dump() - use this to dump the dataDb
DB.iterator() - use this to get an iterator on dataDB
DB.count() - use this to count from dataDB

And with that, the DB.test.ts now passes.

CMCDragonkai commented 2 years ago

One issue is the difference in API between DB and Transaction for some methods like clear, iterator, dump and count.

And the fact that I'm only supporting promise-API, like for iterator. So callback style won't work.

CMCDragonkai commented 2 years ago

The DBTransaction interface is removed. In place, I'm going to rename Transaction class to DBTransaction. This is just because Transaction is quite an overloaded name, so this just ensures that when importing like import { DB, DBTransaction } avoids less renaming.

CMCDragonkai commented 2 years ago

Rather than using the AbstractIterator, both DB.iterator and DBTransaction.iterator now both return DBIterator. This will be used as a stopgap until #11 is done.

The APIs between DB and DBTransaction is still different because we want to eventually move to where you can just specify a key path instead of directly maintaining particular sublevel objects. Once all methods can just take a full keypath, there's no need to manage a bunch of sublevel objects, and that should reduce the object maintenance overhead.

This means later in #11, we should have:

// root iterator at dataDb
DB.iterator(options);
// same
DB.iterator(options, []);
// down level 1
DB.iterator(options, ['level1']);
DB.iterator(options, ['level1', 'level2]);
DBTransaction.iterator(options, ['level1', 'level2']);

DB.count();
DB.count(['level1', 'level2']);
DBTransaction.count();
DBTransaction.count(['level1']);

DB.clear();
DB.clear(['level1']);
DBTransaction.clear(['level1']); // this should be a transactional clear, to do this, we must iterator and then do `db.del`

DB.get('key');
DB.get(['level1', 'key']);
// these are not allowed, or they are equivalent to asking for undefined
DB.get([]);
DB.get();

DB.put('key', 'value');
DB.put(['level1', 'key'], 'value');

DB.del('key');
DB.del(['level1', 'key']);
DBTransaction.del('key');
DBTransaction.del(['level1', 'key']);

CMCDragonkai commented 2 years ago

Still need to add DBTransaction.clear method to be a transactional clear that iterates over a sublevel and deletes all the entries.

CMCDragonkai commented 2 years ago

Ok this is ready to merge.

MatrixAI / js-db