MatrixAI / js-db

Key-Value DB for TypeScript and JavaScript Applications
https://polykey.com
Apache License 2.0
5 stars 0 forks source link

Get after delete within transaction should be consistent #16

Closed CMCDragonkai closed 2 years ago

CMCDragonkai commented 2 years ago

Description

Bug discovered in https://github.com/MatrixAI/js-encryptedfs/pull/63#issuecomment-1101191551.

Right now getting a key value after deleting the same key value within the same transaction does not work. It ends up showing the key value as if it was not deleted. Only committing the transaction results in the key value being deleted.

Confirmed with:

import Logger, { LogLevel, StreamHandler } from '@matrixai/logger';
import { DB } from './src';
import * as testUtils from './tests/utils';

async function main () {

  const logger = new Logger(`${DB.name} Test`, LogLevel.WARN, [
    new StreamHandler(),
  ]);

  const crypto = {
    key: testUtils.generateKeySync(256),
    ops: {
      encrypt: testUtils.encrypt,
      decrypt: testUtils.decrypt,
    },
  };

  const db = await DB.createDB({
    dbPath: './tmp/db',
    crypto,
    fresh: true,
    logger
  });

  const acquireTran = db.transaction();

  await db.put(
    ['INodeManager', 'data', '2', Buffer.from([0])],
    Buffer.from('abcdef'),
    true
  );

  let releaseTran, tran;

  [releaseTran, tran] = await acquireTran();

  console.log('BEFORE DELETE', await tran!.get(
    ['INodeManager', 'data', '2', Buffer.from([0])],
    true
  ));

  await tran!.del(
    ['INodeManager', 'data', '2', Buffer.from([0])],
  );

  console.log('AFTER DELETE', await tran!.get(
    ['INodeManager', 'data', '2', Buffer.from([0])],
    true
  ));

  await releaseTran();

  console.log('AFTER TRANSACTION', await db.get(
    ['INodeManager', 'data', '2', Buffer.from([0])],
    true
  ));

  await db.stop();

}

main();

Issues Fixed

Tasks

Final checklist

CMCDragonkai commented 2 years ago

The problem with deletion is this.

It only deletes within the snapshot.

In the get, operation, it looks up within the snapshot first, and if it doesn't exist, it then looks up directly from the the underlying DB. And since the data still may exist in the underlying DB, then get ends up getting the data from the DB.

Something similar should happen like this:

db.put('hello', 'world');
tran.put('hello', 'another');
tran.del('hello')
tran.get('hello') // becomes world

It seems our delete needs to place a "tombstone", so that if has been deleted, it should not then try to look it up from the DB anymore, it should accept that the data does in fact not exist.

Right now our snapshot is in the DB itself, that is the snapshot state is no longer in-memory, so I wonder if it is possible to "wrap" our values in the a snapshot state, so we can differentiate the tombstone induced by tran.del vs really having undefined in the underlying DB.

CMCDragonkai commented 2 years ago

In terms of wrapping the state or using tombstones, we could do this in different ways:

  1. Wrap incoming values in the tran to the snapshot using JSON. This can be kind of inefficient, since it involves JSON serialisation. But you could then do { value: T | undefined }, and thus if { value: undefined } then it has been deleted from the snapshot, whereas not having a value at all would mean you have to load it from the DB.
  2. Making use of extra sublevel to add metadata to our snapshot. So instead of just [transactions, 123], we can also have [transactions, 123.meta]. We know that any key path is appended to the [transactions, 123] we don't control this key path, as users may supply anything, so we just need an additional key that is checked to see if a tombstone has been added.

Solution 2 seems to be most simple and effective. It just becomes one extra step in the get, put, del operations. The iterator will also need to keep track of this tombstone.

CMCDragonkai commented 2 years ago

I've done this by adding data and tombstone sublevels to the transaction path.

However I've noticed that the iterator also needs to be updated because if the key doesn't exist in the snapshot it ends up returning what is in the underlying DB.

I need it to interrogate the tombstone as well before returning the underlying data.

CMCDragonkai commented 2 years ago

I've fixed it in the simple case where the data iterator checks if the value is actually entombed. However, this may not work if both tran iterator and data iterator gives back a value.

In that case, it's possible that the data iterator's key is still entombed, and in that case, we must also ignore that data iterator's value, and jump to the next data iterator's value.

CMCDragonkai commented 2 years ago

Fixed the get after delete even for iterators, extra tests confirmed this even for reverse case and also for multiple entombed keys and for multi-level keys.

CMCDragonkai commented 2 years ago

When using withG or withTransactionG, you may need to specify the T type explicitly:

    const g = db.withTransactionG(async function* (
      tran: DBTransaction,
    ): AsyncGenerator<[Buffer, Buffer]> {
      for await (const [k, v] of tran.iterator()) {
        yield [k, v];
      }
    });

Same problem as before: https://stackoverflow.com/questions/71645327/typescript-doesnt-infer-tnext-unknown-for-asyncgenerator-function-expressi