Closed CMCDragonkai closed 2 years ago
Based on this comment https://github.com/MatrixAI/js-db/pull/2#issuecomment-946367753 about level-js
, does snapshot isolation exist in level-js
and IndexedDB
?
Ok so it turns out we can get a snapshot of leveldb by doing db.iterator()
.
This iterator then maintains a snapshot, and exposes the methods of:
// same options as the stream creators
db.iterator({ ... })
iterator.db // the db instance (does this maintain the snapshot?)
iterator.seek()
await iterator.next()
await iterator.end()
for await (const [key, value] of iterator) {
// ...
}
The combination of the seek
and next
allows you to use this to get values from the snapshot as well.
This is useful because you can potentially create multiple snapshots at the same time:
const i1 = db.iterator();
const i2 = db.iterator();
This allows you to potentially stream over i2
over a different area such as from subleveldown, then using i1
to acquire other values.
This system should produce a repeatable-read system, since now we can ensure that we always read the same value within the same transaction. Remember that MySQL's default isolation level is repeatable read. But PostgreSQL's default isolation level is read-committed.
Read Committed is the default isolation level in PostgreSQL. When a transaction uses this isolation level, a SELECT query (without a FOR UPDATE/SHARE clause) sees only data committed before the query began; it never sees either uncommitted data or changes committed during query execution by concurrent transactions. In effect, a SELECT query sees a snapshot of the database as of the instant the query begins to run. However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed. Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.
With this in mind, as our current transaction is read-committed level, and is not repeatable-read, if we want repeatable-read, it would be possible to use iterator to achieve this.
There is one other issue, with our current transaction system, there's no way to iterate transactionally without creating the iterator
snapshot. I'm not entirely sure what this would mean though.
The types for level ecosystem hasn't really been updated with these new methods though.
I think an order of priority can be like:
getMany
and the createReadStream
and so on.subleveldown
take buffers and not just strings for prefixesget(keys: string | Buffer | Array<string | Buffer>)
tran.iterator()
and you would have to compare a SortedMap
key values against the snapshot's key values, and prefer what you have changed firstThis would be a big release though, 4. would be a breaking change, requiring changes in PK and EFS.
With the merging #10, we have decided this is not actually ever needed. Our transactions are going to be read-committed and if the user requires repeatable read, they just need to use some advisory locking.
Relevant:
@CMCDragonkai commented on Sat Aug 14 2021
Is your feature request related to a problem? Please describe.
Proper MVCC transactions make use of a snapshot that represent the state of the database. As mutations (put & delete) build up, they apply to the snapshot. This makes it easier to build up a transaction by composing procedures/functions that mutate a transaction object. It means you get read-after-write consistency within the snapshot. Where you may have an operation that depends on the state of the database. Like let's say a counter increment. But if the prior mutation to the transaction already incremented the same counter, it would be incoherent for the subsequent mutation to plus 1 to the counter thinking the counter hasn't already been incremented.
Right now leveldb already supports snapshots natively. However it's not exposed via the JS wrapper. There are pre-existing issues.
If we could have snapshot isolated transactions, it would simplify some of our algorithms here for inodes especially since we have counter increment operations that result from linking and unlinking the inodes.
Describe the solution you'd like
Have a snapshot ability for the leveldb that we can expose through our
DB
abstraction. Similar to the python library of leveldb: https://plyvel.readthedocs.io/en/latest/api.html#snapshotNote that a snapshot by itself is not sufficient to provide snapshot-isolated transactions. A combination of a "mutating" snapshot and the batching object which can overlay changes on top of the leveldb database can produce snapshot-isolated transactions.
This would mean an API like:
In fact I suspect it would be possible to just extend the
Batch
object to have this.Additional context
@CMCDragonkai commented on Sat Aug 14 2021
We often have dual operations one that is
doX
anddoXOps
.The reason is that currently
doXOps
can be composed to batch up an atomic commit.However with this transaction concept, it may be possible to do
doX(t)
where you pass a transaction in.This should mean if you pass a transaction, it applies the operations against the transaction.
If you don't it creates its own transaction and commits to the database.
You may need to use try catch above.
This would again cut down on the number of code we have and simplify it quite a bit.
@CMCDragonkai commented on Sat Aug 14 2021
So you'd need to:
Batch
classThe first one is the toughest to eventually merge into leveldown. It would be a fork of leveldown supporting this feature, and we would have to create our own "forked" bundle that bundles the forked leveldown supporting snapshots.
@CMCDragonkai commented on Sat Aug 14 2021
A proper MVCC system would also include lock management centralised at the DB class which would then coordinate multiple transactions together. This would be similar to how we are currently using async-mutex to synchronise operations. You then have concurrency control introduced.
It would generalise the entire special casing of async mutexes that we are dealing with right now.
@CMCDragonkai commented on Wed Aug 18 2021
In the process of solving #47 we have created our own makeshift transaction/snapshot system. It operates with read-committed isolation level.
The
Transaction
object doesn't maintain locks by itself, and it doesn't check locking for keys. But instead relies on the user to specify the relevant locks in thedb.transaction
method. So it's an "advisory transaction with read-committed snapshot isolation". I'm probably mixing up concepts to create a syncretic practical solution for our current situation.However as a further extension, it may be possible to also incorporate:
To create an in-memory snapshot.
@CMCDragonkai commented on Tue Oct 12 2021
This issue is more relevant to js-db now instead of EFS. EFS makes use of DB transactions.