feature request: support 2 phase commit

LongTengDao commented 3 years ago

Let me use JavaScript to describe it.

Currently:

const level = require('level');
const db = level('/db');

export function io (actions) {
  await db.batch(actions);
}

But when I want to do a transaction with another leveldb (or with other fs.writeFile), I must record (userland log) the batch myself:

const level = require('level');
const db0 = level('/db0');
const db1 = level('/db1');

const recorded = await readRecorded();
if ( recorded ) {
  await db0.batch(actions[0]);
  await db1.batch(actions[1]); // [^1]
}

export function save (actions0, actions1) {

  await record(actions0, actions1);// which cause io twice

  await db0.batch(actions0);
  // [^1]: if the process crash or computer power off here, when reboot, the transaction will continue via the record.
  await db1.batch(actions1);

}

Aim (api could be below or things like that):

const level = require('level');
const option = await hadRecorded()
  ? 'second'  // [^1]
  : 'giveup'; // [^2]
const db0 = level('/db0', option);
const db1 = level('/db1', option);

export function save (actions0, actions1) {

  await db0.batchFirst(actions0);
  // [^2]: if the process crash or computer power off here, we can `const db0 = level('/db0', 'giveup');` ourself.
  await db1.batchFirst(actions1);

  await record();

  await db0.second();
  // [^1]: if the process crash or computer power off here, we can `const db1 = level('/db1', 'second');` ourself.
  await db1.second();

}

vweevers commented 3 years ago

Could you please elaborate? I don't understand what db.batchFirst() should do or how it'd even know about the second db.

I'm certain though that 2PC should be implemented in userland, as it requires (at minimum) a transport, protocol & coordinator, which is far outside the scope of a single-process level db.

LongTengDao commented 3 years ago

@vweevers

Could you please elaborate? I don't understand what db.batchFirst() should do or how it'd even know about the second db.

I'm certain though that 2PC should be implemented in userland, as it requires (at minimum) a transport, protocol & coordinator, which is far outside the scope of a single-process level db.

Ok. I mean, when I want to implement multiple db writing as a whole transaction, I need to firstly additionally record all the data I will pass to leveldb api, and then really call the leveldbs api, so that if power off during this period, I can finish the rest via the record when reboot.

But this means double the performance consumption. Since leveldb itself already implements the log-dump ① mechanism within a single database, it means that such a heavy cost can be easily avoided via a slight change of leveldb.

① dump: I try to use leveldb term here, which means moving a data from log to sst.

The thing db.batchFirst() to do is very like db.batch(), the difference is that it only log, and never dump without db.second() called; if program dead between the two action, and reboot then level('/db') next time, user can specify level('/db', option) to decide, giveup the last log, or make it dump-able.

This allows users to implement multi-database transactions without additional record of all behaviour data: when all db.batchFirst() returns successfully, the userland code simply record true, so that if the power is lost at the moment , the next time we open the db, we pass option second, and all log will be dump-able; if this record does not exist, the userland code knows that it failed halfway, then it pass option giveup when opening the db, all the last time db.batchFirst() log data will disappear.

I hope I explained it clearly... Thanks for continuing to communicate!

vweevers commented 3 years ago

Can you define "dump"? If you mean moving a log to an SST (.ldb file) this is done when the log reaches a certain size (or upon startup when recovering from a crash), rather than after every batch AFAIK. Which is to say, it can contain multiple batches.

LongTengDao commented 3 years ago

@vweevers

Can you define "dump"? If you mean moving a log to an SST (.ldb file) this is done when the log reaches a certain size (or upon startup when recovering from a crash), rather than after every batch AFAIK. Which is to say, it can contain multiple batches.

Ok. I refined my expression above, it was not rigorous. Yes, dump should be what you mean, and leveldb can retain the log-dump mechanism you said. The operation expected by db.second() is not really dump internally, but to make the un-dump-able log added by db.batchFirst() become dump-able (equals to log added by db.batch()).

In other words, in the traditional mode, thing is like this:

When db.batch(), leveldb do a log operation. Once successful, the db is equivalent to being successfully modified. As for the internal dump, it can be completely opaque and has nothing to do with the user.

But in the new mode:

When db.batchFirst(), leveldb do a log operation, but this data is in an un-dump-able state, unless db.second() or level('/db', 'second'), otherwise this data will never dump (and disappear when level('/db', 'giveup')).

vweevers commented 3 years ago

moving a log to an SST (.ldb file) [..] is done when the log reaches a certain size (or upon startup when recovering from a crash)

To clarify, what I meant to say is that it's outside of our control. This mechanism is implemented in LevelDB.

LongTengDao commented 3 years ago

moving a log to an SST (.ldb file) [..] is done when the log reaches a certain size (or upon startup when recovering from a crash)

To clarify, what I meant to say is that it's outside of our control. This mechanism is implemented in LevelDB.

Oh! Sorry! I though this repo is the level db implement repo...

vweevers commented 3 years ago

With that clarified, it seems there's no further action item here, so I'm closing this.

Level / level

feature request: support 2 phase commit #192

Aim (api could be below or things like that):