Closed CMCDragonkai closed 2 years ago
In trying out rocksdb, it appears there's differences at the C++ level.
In the new classic-level
we have iterator_close
and iterator_nextv
, while in the current rocksdb, it's still using iterator_end
and iterator_next
.
Basically the same as leveldown
, whereas classic-level
upgraded their API. We should be able to change up the binding.cc
and recompile.
Also note that rocksdb is 7.9 MiB, while leveldb was only half a megabyte. It's significantly longer to compile, which means I'm trying out multi-core builds for node-gyp build using npm_config_jobs=max
.
The npm_config_jobs=max
should be backported to TS-Demo-Lib-Native.
I had a thought that if we were developing solely on Nix, we could just use https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/libraries/rocksdb/default.nix and expect our binding.gyp to compile and at the end just link with whatever shared objects the rocksdb library has. It would be faster and less maintenance to keep rocksdb source code internally.
However that would preclude any ability to compile and build for windows and possibly other platforms. MacOS could still work with nix of course.
We don't have any nix integration with macos in the CI/CD yet, I did explore it but I wasn't sure if it was going to be more performance given that the macos runners don't have it.
I've tested with rocksdb, the only issue right now is that rocksdb is still using the old abstract-leveldown compatible, whereas I'm currently using the abstract-level API.
The classic-level already upgraded to using abstract-level, but rocksdb hasn't.
It's only a few minor changes stemming primarily from this commit: https://github.com/Level/classic-level/commit/b9ba3dd6c5e5b36387d1efb040ad69edb6045dc9
2 methods are missing from rocksdb/index.cpp
: https://github.com/MatrixAI/js-db/pull/19#issuecomment-1146639422, see the history from that point onwards: https://github.com/Level/classic-level/commits/main/binding.cc. The migration should be simple, it appears the C++ code is written very similarly between classic-level and rocksdb.
So will need to first upgrade the rocksdb/index.cpp
in a similar vein to how classic-level
did it, and maintain API parity with leveldb/index.cpp
.
This https://github.com/Level/classic-level/compare/ba729d223ec5ec54c5ad88e05c11e7bba71315f2...main shows us the diff from the point of forking from leveldown to classic-level main branch tip. We can use this to figure out what needs to be changed for rocksdb
.
I've synced up the index.cpp
API with the changes in classic-level
, however, I'm hitting a segmentation fault during db_open
call. Will need to debug this further.
Just a silly mistake it's working now.
All DB.test.ts and DBIterator.test.ts works with rocksdb.
Benchmarks for Matrix ML 1:
[nix-shell:~/Projects/js-db]$ npm run bench
> @matrixai/db@4.0.5 bench
> rimraf ./benches/results && ts-node -r tsconfig-paths/register ./benches
Running "DB1KiB" suite...
Progress: 100%
get 1 KiB of data:
40 632 ops/s, ±3.33% | fastest
put 1 KiB of data:
25 323 ops/s, ±1.62% | 37.68% slower
put zero data:
28 292 ops/s, ±2.01% | 30.37% slower
put zero data then del:
15 497 ops/s, ±2.21% | slowest, 61.86% slower
Finished 4 cases!
Fastest: get 1 KiB of data
Slowest: put zero data then del
Saved to: benches/results/DB1KiB.json
Saved to: benches/results/DB1KiB.chart.html
Running "DB1MiB" suite...
Progress: 100%
get 1 MiB of data:
544 ops/s, ±1.92% | 98.01% slower
put 1 MiB of data:
533 ops/s, ±1.22% | slowest, 98.05% slower
put zero data:
27 299 ops/s, ±1.70% | fastest
put zero data then del:
13 880 ops/s, ±1.65% | 49.16% slower
Finished 4 cases!
Fastest: put zero data
Slowest: put 1 MiB of data
Saved to: benches/results/DB1MiB.json
Saved to: benches/results/DB1MiB.chart.html
And matrix-vostro-5402-1
[nix-shell:~/Projects/js-db]$ npm run bench
> @matrixai/db@4.0.5 bench
> rimraf ./benches/results && ts-node -r tsconfig-paths/register ./benches
Running "DB1KiB" suite...
Progress: 100%
get 1 KiB of data:
58 249 ops/s, ±4.22% | fastest
put 1 KiB of data:
30 449 ops/s, ±6.80% | 47.73% slower
put zero data:
35 626 ops/s, ±7.86% | 38.84% slower
put zero data then del:
16 480 ops/s, ±6.61% | slowest, 71.71% slower
Finished 4 cases!
Fastest: get 1 KiB of data
Slowest: put zero data then del
Saved to: benches/results/DB1KiB.json
Saved to: benches/results/DB1KiB.chart.html
Running "DB1MiB" suite...
Progress: 100%
get 1 MiB of data:
578 ops/s, ±2.75% | slowest, 98.11% slower
put 1 MiB of data:
673 ops/s, ±0.53% | 97.8% slower
put zero data:
30 550 ops/s, ±4.90% | fastest
put zero data then del:
18 064 ops/s, ±8.12% | 40.87% slower
Finished 4 cases!
Fastest: put zero data
Slowest: get 1 MiB of data
Saved to: benches/results/DB1MiB.json
Saved to: benches/results/DB1MiB.chart.html
It appears reads 1 MiB has slowed down quite a bit for rocksdb compared to leveldb.
The rocksdb's OptimisticTransactionDB
is a class that extends StackableDB
which extends DB
. The current index.cpp
is using just DB
. Which means once we use OptimisticTransactionDB
, this affects the entire js-db.
Furthermore rocksdb does not throw exceptions, they rely on the returned Status
. This must be checked and then thrown as JS exception like we are doing right now.
It is possible to use the debugger on the native code when running with nodejs. You have to build a debug version of the native addon with node-gyp build --debug
or node-gyp rebuild --debug
.
The node-gyp-build
loader actually prefers to use the release version rather than the debug version, so if both builds are available, the release version will be loaded in preference.
The debug builds is quite heavy, it has to end up building all the dependencies with debug. And so the debug build is 130 MiB.
I reckon one should avoid deleting the release build when attempting to debug, and just move it somewhere else, and move it back in when you want to use the release build.
Once the debug build is available, one can use gdb
on node
. So it ends up doing the same thing with launch.json
.
However with nodejs it's a bit smarter, since the debugger can auto-attach to any nodejs process, this means it's easy to debug nodejs code, because we are writing typescript, and upon using ts-node
, the nodejs runs and the debugger is attached, and breakpoints... etc are used.
However the C/C++ seems to require a bit more configuration: https://code.visualstudio.com/docs/cpp/launch-json-reference, especially since we may be using ts-node in one case, jest in another case... The debugger has to attach to the node process.
It could also just work with: gdb --args /usr/local/bin/node app.js
. Not sure if that triggers vscode's breakpoints.
I believe that we never used launch.json
at all, and only relied on the automatic attach system that vscode has. This made it easier with zero configuration. It would be nice if gdb can also be simultaneously auto-launched/attach for a given program, in this case it would also be nodejs. One might even be able to run both debuggers at the same time, and so you can add breakpoints/logpoints in TS and C/C++ code.
It appears that DB
supports OpenForReadOnly
, and this is a static member function, it is inherited by OptimisticTransactionDB
, however the static function produces a DB
and not a OptimisticTransactionDB
.
I'm actually not sure about this. The compiler does complain about an invalid conversion from DB
to OptimisticTransactionDB
or OptimisticTransactionDB
to DB
.
Obviously we want to open an OptimisticTransactionDB
, so it seems OpenForReadOnly
won't work.
So given there's no readonly mode, we can remove this readonly option.
It appears changing over to the OptimisticTransactionDB
is fully compatible with just normal DB operation.
The only major change is the additional method of BeginTransaction
.
There's also additional options that we can put in during open: OptimisticTransactionDBOptions
.
This is only available while also passing in:
static Status Open(const DBOptions& db_options,
const OptimisticTransactionDBOptions& occ_options,
const std::string& dbname,
const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles,
OptimisticTransactionDB** dbptr);
The ColumnFamilyDescriptor
and ColumnFamilyHandle
.
But the options don't seem relevant to us:
struct OptimisticTransactionDBOptions {
OccValidationPolicy validate_policy = OccValidationPolicy::kValidateParallel;
// works only if validate_policy == OccValidationPolicy::kValidateParallel
uint32_t occ_lock_buckets = (1 << 20);
};
We will need to treat transactions similar to iterators for the struct Database
in index.cpp
.
For example:
const leveldb::Snapshot* NewSnapshot () {
return db_->GetSnapshot();
}
leveldb::Iterator* NewIterator (leveldb::ReadOptions* options) {
return db_->NewIterator(*options);
}
We'll add something like:
rocksdb::Transaction* NewTransaction (rocksdb::WriteOptions* writeOptions, rocksdb::TransactionOptions* tranOptions) {
return db_->BeginTransaction(*writeOptions, *tranOptions);
}
Note that the transaction operations is optional, it is by default:
// Options to use when starting an Optimistic Transaction
struct OptimisticTransactionOptions {
// Setting set_snapshot=true is the same as calling SetSnapshot().
bool set_snapshot = false;
// Should be set if the DB has a non-default comparator.
// See comment in WriteBatchWithIndex constructor.
const Comparator* cmp = BytewiseComparator();
};
The only relevant option for us would be the set_snapshot
. I believe it doesn't really need to be set at all: https://github.com/facebook/rocksdb/wiki/Transactions.
Once I have the transaction I can get an iterator with GetIterator
. It appears it will make use of a overlay uncommitted transaction state like I built in our current DBTransaction
.
This should mean our DBTransaction
class is likely to be quite simplified and rely instead on the C++ code.
This will be useful too https://www.fluentcpp.com/2019/08/30/how-to-disable-a-warning-in-cpp/.
Regarding the mixing of optimistic and pessimistic transactions, it seems this is not easily doable inside rocksdb. I've asked a question about this: https://groups.google.com/g/rocksdb/c/5v64dTxYKEw.
For now, I'll proceed with OptimisticTransactionDB
, and if we need to do, serialisable locking could be done on the outside in JS-land, it would of course be subject to potential deadlocks and other possibilities until a lock manager can manage all of those.
The index.cpp
maintains its own "classes" with struct Database
, struct Iterator
... etc. I've created struct Transaction
to act as the internal reference object that in JS-land, they must pass as a reference into rocksdbP.transaction_*
calls.
Therefore there's multiple API layers:
rocksdbP
and rocksdb
module objects - this is what the JS exposes to be calledtransaction_open
, transaction_commit
, transaction_rollback
(and more) as the top level calls.struct Transaction
object.struct Transaction
object maintains an internal rocksdb::Transaction
object, and performs calls against that internal object.struct Database
because the database object also keeps references to the transactions in order to close them when closing the database.I find it interesting that the C++ land is mirroring the JS land. Do we really need so many object layers? This is how classic-level was already designed with its struct Iterator
and struct Database
.
I imagine that in C++ land, one could potentially do away with its own Transaction
objects, and instead pass rocksdb::Transaction
objects directly as references usable from JS-land, like why bother creating another object wrapper encapsulating rocksdb objects?
Possibly if the native addon itself has to complicated stuff that is not related to the internal rocksdb.
Once we get around to creating iterators within the transaction, do we say that the Database
still owns the Iterator
, or is this now Database
-> Transaction
-> Iterator
?
Would be good to also use clang-tidy
. Integration requires all the relevant include paths. Will need to see: https://github.com/mapbox/node-cpp-skel/pull/64. It requires the clang-tools
package.
Discovered a bug in iterator_close
and therefore transaction_rollback
and transaction_commit
. This affected upstream too. Basically repeated calls to iterator_close
would fail because the second call onwards is a noop call and does not call the callback. This causes the JS runtime to never continue executing.
The solution is to call the callback with a null
indicating no error if we are conditionally calling the internal side effect.
I also checked the db_close
if it had the same problem, and it doesn't. This is because it either queues the callback for execution, or assigns it as the database->pendingCloseWorker_
which will be executed at the end of each priority worker.
Note that most things are PriorityWorker
except OpenWorker
, CloseWorker
, DestroyWorker
, RepairWorker
, CloseIteratorWorker
, NextWorker
.
I believe I should be making CommitTransactionWorker
and RollbackTransactionWorker
to be PriorityWorker
as well. They seem to be important to be prioritised before the other workers if I'm interpreting this correctly.
It seems the PriorityWorker
is basically intended for async work to always take priority over closing the DB.
Then I wonder why things like CloseIteratorWorker
and NextWorker
are not priority workers.
I've got transaction_init
, transaction_commit
, and transaction_rollback
working.
I think it's a good idea to split up the C++ code, which is currently 2476 lines of code atm. It's getting difficult to constantly have 2/3 windows of the same file.
This should just be a matter of creating a few separate files, which are separately compiled and linked together. There will need to be headers to be included to allow forward declaration.
The napi-macros
package doesn't export the function names when using NAPI_EXPORT_FUNCTION
, so I've just redefined it here. Probably in the future, can just switch to using node-addon-api
instead which is better maintained.
Here's what I've done:
/**
* NAPI_EXPORT_FUNCTION does not export the name of the function
*/
#undef NAPI_EXPORT_FUNCTION
#define NAPI_EXPORT_FUNCTION(name) \
{ \
napi_value name##_fn; \
NAPI_STATUS_THROWS_VOID(napi_create_function(env, #name, NAPI_AUTO_LENGTH, name, NULL, &name##_fn)) \
NAPI_STATUS_THROWS_VOID(napi_set_named_property(env, exports, #name, name##_fn)) \
}
This means the functions being exported will have the name
property, and we can see in the stack trace what function is called.
Furthermore the promisify
is extended to preserve the function name when wrapping:
Object.defineProperty(g, 'name', { value: f.name });
This means promisify
should be updated in PK as well @emmacasolin:
In relation to https://github.com/facebook/jest/issues/12814 it turns out that when running in Jest, even exceptions like Error
created by napi_create_error
are not instances of Error
.
This means any tests against the native addons will need to be aware that strict type checks won't work.
This might mean that in the source code where it checks if the e
is an instanceof Error
, it just can't be done, and instead we must only be checking the properties like code
.
Looks like I need some forward declarations here, right now there's an include loop between database.h
, transaction.h
and iterator.h
.
I wonder though do these header need to declare forward declarations?
The warning about the loglevel is a legitimate warning. It is possible that the log level configuration is wrong, and if so, that would result in a JS exception being thrown. however this doesn't prevent the C++ constructor code from continuing.
The ideal case is for the C++ constructor to throw an exception if it fails to construct for whatever reason. The problem here is that the binding.gyp
and related files was inherited from rocksdb in level. In there, it seems there is configuration that disables the existence of C++ exceptions, if not linux, at least windows.
So for now since we don't want to experiment with C++ exceptions right now, we can do a couple things:
return
to cancel computation, this means the object is still constructed, just rest of the code hasn't finished initialising. This creates a zombie object as explained here: https://stackoverflow.com/a/737656/582917.db_open
, thus throwing the JS exception there and not actually proceeding to construct the OpenWorker
.I'm going to try out solution 4.
Solution 4. should be working now. We figure out the rocksdb::InfoLogLevel
and rocksdb::Logger
ahead in db_open
first, and then we pass that into the OpenWorker
.
Note that the setting of the NullLogger
only occurs when the infoLogLevel
option is empty. Otherwise, it leaves it as whatever it is defaulted to.
No more major warnings. And the C++ code is no split out.
All tests pass on the split out code.
Also if an invlid log level will be used, then we get an JS exception in the callback with code DB_OPEN
and the message Invalid log level
.
Now to add in the rest of the transaction functions.
I have the basic transaction methods, it's time to refactor DBTransaction
to support those methods, and reactivate some simple tests involving DBTransaction
.
The rocksdb's
OptimisticTransactionDB
is a class that extendsStackableDB
which extendsDB
. The currentindex.cpp
is using justDB
. Which means once we useOptimisticTransactionDB
, this affects the entire js-db.Furthermore rocksdb does not throw exceptions, they rely on the returned
Status
. This must be checked and then thrown as JS exception like we are doing right now.It is possible to use the debugger on the native code when running with nodejs. You have to build a debug version of the native addon with
node-gyp build --debug
ornode-gyp rebuild --debug
.The
node-gyp-build
loader actually prefers to use the release version rather than the debug version, so if both builds are available, the release version will be loaded in preference.The debug builds is quite heavy, it has to end up building all the dependencies with debug. And so the debug build is 130 MiB.
I reckon one should avoid deleting the release build when attempting to debug, and just move it somewhere else, and move it back in when you want to use the release build.
Once the debug build is available, one can use
gdb
onnode
. So it ends up doing the same thing withlaunch.json
.However with nodejs it's a bit smarter, since the debugger can auto-attach to any nodejs process, this means it's easy to debug nodejs code, because we are writing typescript, and upon using
ts-node
, the nodejs runs and the debugger is attached, and breakpoints... etc are used.However the C/C++ seems to require a bit more configuration: https://code.visualstudio.com/docs/cpp/launch-json-reference, especially since we may be using ts-node in one case, jest in another case... The debugger has to attach to the node process.
It could also just work with:
gdb --args /usr/local/bin/node app.js
. Not sure if that triggers vscode's breakpoints.I believe that we never used
launch.json
at all, and only relied on the automatic attach system that vscode has. This made it easier with zero configuration. It would be nice if gdb can also be simultaneously auto-launched/attach for a given program, in this case it would also be nodejs. One might even be able to run both debuggers at the same time, and so you can add breakpoints/logpoints in TS and C/C++ code.
I had to just try this.
So the issue was that there was a segfault in the C++ code occurring in the Transaction::Put
method call.
I narrowed it down to:
rocksdb::Status s = dbTransaction_->Put(key, value);
But I couldn't see what the segfault was from.
First thing is to delete the build
directory so that the old Release
was removed, because node-gyp
prefers using the Release
instead of Debug
build. I believe we should change this up by making the node-gyp-build
prefer the debug build first, since this allows us to debug more easily, and then just delete the debug build when we're ready.
Then use node-gyp rebuild --debug
we are rebuilding the debug binary.
Now the issue gets a bit complicated, normally we are using ts-node
to run things, but ts-node
isn't the binary we want to use gdb
on. It's the node
binary that we want to run it on. So for now I don't know how to use gdp
on the ts-node
, so I first just use tsc
to compile all of the code like rimraf ./dist && tsc -p ./tsconfig.build.json
.
Then I can write a simple JS script that I want to check, in this case a prototyping script test-rocksdb.js
.
Finally I could use gdb node
, and inside the GDB shell, we could use run ./test-rocksdb.js
.
In this case I'm not even using a breakpoint at all. But I did have https://github.com/cyrus-and/gdb-dashboard integrated into my system.
What it showed me was that the this
property in the stack was 0x0
. That's a null pointer.
Subsequently confirming it by checking this
against nullptr
, also revealed that the this
was null.
Note that subsequent runs/builds doesn't always show this, sometimes it ended up being 0x3
. At any case, the this
is not what it should be. It should be a pointer to the Transaction
object.
That's when I checked the transaction_workers.cpp
and noticed that I forgot to instantiate the tran_
in the other transaction workers that I created. Because TransactionCommitWorker
and TransactionRollbackWorker
had it properly setup.
Not instantiating properties means that they get instantiated to null pointer or possibly some undefined behaviour.
The usage of C++ constructor initialiser list is definitely error prone, since it has to be in-order of the declarations of the struct/class, and vscode doesn't tell you anything about the pointer not being initialised, because there are legitimate uses for doing so.
For further intensive dev on C/C++, definitely need to figure out how to integrate gdb into the vscode system while automatically compiling the typescript first. And probably integrate clang format into it too.
Important to be aware that the rocksdb slices can be converted to C++ strings with key.toString()
. To convert to C strings, further c_str()
can be used. So key.toString().c_str()
. Important if you want to use fprintf
. But in C++, std::cerr
is more idiomatic.
std::cerr << key.ToString().c_str() << std::endl;
std::cerr << value.ToString() << std::endl;
The clang-tidy
does some semantic checks like. This is how you use it:
clang-tidy \
--checks='*' \
./src/rocksdb/napi/worker.cpp \
-- \
-isystem /nix/store/zl4bvsqfxyx5vn9bbhnrmbmpfvzqj4gd-nodejs-16.14.2/include/node \
-isystem /nix/store/hym1n0ygqp9wcm7pxn4sfrql3fg7xa09-python3-3.9.12/include \
-isystem ./deps/rocksdb/rocksdb/include \
-isystem ./node_modules/napi-macros
It can make use of .clang-tidy
configuration file as well, but I haven't figured that out yet.
Note the usage of include/node
. This is because we are currently including node-api.h
, but not node/node-api.h
. It seems node-gyp
understands to go into the node
directory too.
There are alot of checks and not all are relevant, so we would need to create our own standard like we do with .eslintrc
.
Because clang-tidy
requires compilation flags, later we can just use $NIX_CFLAGS_COMPILE
.
More information here: https://medium.com/@inflatonn/better-coding-practices-for-your-organisation-using-clang-tidy-and-clang-format-d736864b96e2
Seems like to be compatible with the rest of the system, it would make sense to use #include<node/node_api.h>
instead of just #include<node_api.h>
. We can try and see what happens if it compiles well on the other platforms too.
We now have npm run lintnative
and npm run lintnativefix
. I disabled the include sorting because it's not able to know whether something is an external library atm, the clang-format
is configured in .clang-format
. The clang-tidy
could be further improved like:
clang-tidy \
--checks='*' \
./src/rocksdb/napi/worker.cpp \
-- \
$NIX_CFLAGS_COMPILE \
-isystem ./deps/rocksdb/rocksdb/include \
-isystem ./node_modules/napi-macros
Note that globbing doesn't work nicely in npm run
scripts, so you must use find
instead. The main problem is that compilation flags are necessary, and so node-gyp
right now has all the configuration and configuring clang-tidy
is extra work.
Anyway having clang-format
working is great for now.
It might be better to rely on https://devblogs.microsoft.com/cppblog/visual-studio-code-c-december-2021-update-clang-tidy/
I've tested the transactionGetForUpdate
, and this now works to solve write skew anomalies pretty easily. Basically it promotes reads into a same-value write. This explains my earlier question about the difference between software transactional memory and databases. STM seems to rely on value equality, but databases use logical timestamps instead. So writing the same value just means updating a logical timestamp. Databases have to deal with more sophisticated data, whereas STM can get away with a simpler model.
We can now proceed to bring in iterators for the transactions, and get onto fixing up DBTransaction
.
In bringing in first class snapshots, we had to create Snapshot
class that encapsulates rocksdb::Snapshot
. The reason was the same reason for iterator/transaction, object lifecycle management, if we were to return a reference to a C++ object in a native method, JS expects them to be garbage collected as soon as the scope is finished. Because the object must live beyond the scope of the native method, we have to keep track of it.
This is done by incrementing the reference count by 1, but then also keeping a reference to it in the Database
. This way now if the Database
closes, it will end up releasing all of these snapshots.
Note that there are several ways the Database
closes:
dbClose
native method, this does it asynchronously. But it also uses the snapshot_release_do
.napi_env
to expire or become invalid, which is triggered through a cleanup hook. This happens when the node runtime itself is closing. This ends up calling env_cleanup_hook
. This does the cleanup functionality synchronously.Now there's something additional one has to be concerned about when calling dbClose
. Because it is asynchronous, the codebase tries to currently schedule these closing operations after any priority operations that is still running.
This is done through the PriorityWorker
and a reference count on Database
called Database::ref_
, and with the count value put into Database::priorityWork_
property.
So if there is priority work to do, the closing operation is registered under Database::pendingCloseWorker_
. This ends up being called when the reference count reaches 0.
This explains why CloseWorker
is extending BaseWorker
, and not PriorityWorker
.
Now we have a similar situation for transactions. If we proceed to call dbClose()
while there are still asynchronous operations in the transactions, we should be scheduling the transaction rollbacks after any priority work that's being done in the transaction.
This is what's still TBD in transaction_rollback_do
, as well as transactionCommit
.
// TODO:
// if other async ops, delay this operation
To solve this, we should be adding a reference count of priority work in our Transaction
. Then the PriorityWorker
that comes out of Transaction
should be decrementing the count for Transaction
, not just the Database
.
For iterator this was done differently. Instead a DoFinally
was registered for all the workers in the iterator, and so iteratorClose
is always scheduled after any asynchronous concurrent NextWorker
. This was easy to do because there was only 1 other asynchronous method.
For transaction it would be a bit more tedious to implement.
The reason is that the dbClose
currently just immediately deletes the database if there is no priority work. Priority work isn't just whenever there's a PriorityWorker
, it's also when there are Transaction
, Snapshot
or Iterator
still live. We could create a second constructor for PriorityWorker
that increments & decrements priority work for Transaction
instead.
The new Snapshot
class will when constructed with Transaction
will do:
Snapshot::Snapshot(Transaction* transaction, const uint32_t id)
: database_(transaction->database_),
id_(id),
isReleasing_(false),
hasReleased_(false),
ref_(NULL) {
// This ensures that the transaction has consistent writes
transaction->SetSnapshot();
// Use this snapshot to get consistent reads
dbSnapshot_ = transaction->GetSnapshot();
}
This ends up creating a consistent read and write when snapshots are used.
One just needs to call transactionSnapshotInit
like:
const snapshot = await transactionSnapshotInit(tran);
The DB has its own called dbSnapshotInit
.
At the end, the snapshot should be released with snapshotRelease
.
// notice this is asynchronous
await snapshotRelease(snapshot);
Transactions have their own priority count now. However I think I broke something there... need to verify that the ref counts make sense.
Transaction ref counts appear to be working.
I have a test for snapshots now, but I need to figure out what happens if you only call transactionSnapshotInit()
. Because it does give you back a snapshot object, you can't just call it without dealing with it, otherwise resource leak.
However also the snapshot itself can't actually be released until after the transaction is committed or rollbacked. I would argue that if it is doing this, the snapshots should also be invalidated. But right now snapshot release has nothing to do with the transaction, but instead done as part of the database shutdown.
So one must first transactionCommit
and then snapshotRelease
the snapshot if it was used. This is based on the theory that the snapshot is needed to be cleaned up after the transaction where the snapshot was acquired is still live.
Perhaps one can release the snapshot before the transaction commits, but it doesn't affect the commit itself.
It's definitely kind of limiting not being able to use C++ exceptions, it makes it difficult to write procedures that terminate execution thread and have it bubble upwards. If we write functions we then need to check if those functions returned true, but this goes against my ideal writing procedures and functions separately so that it is clear that the former performs side-effects, and the latter doesn't.
So for now I've settled on macros.
Basically the issue is that we should be checking whether the transaction is ready to use. If it has already been committed or rollbacked, it should be usable afterwards, without an appropriate check, we end up with a segfault.
It seems common to reach a segfault whenver something that shouldn't be used is being used. Basically null pointer exception, which would be obvious in a managed language, whereas in C++ we don't see it at all, it's just a segfault.
So for now the C++ code isn't the best well abstracted atm, it would require more conditional checking and some further abstractions to clean up in the future, especially checking for valid states where something can be performed.
If we are to improve the C++ codebase, we should be using node-addon-api for more sophisticated work. Not having exceptions is like trying to write in C, and C is just quite limited in its capabilities.
And yes, it does appear C, C++ (https://stackoverflow.com/a/32127664/582917), and Rust (Result
) all prefer using something other than exceptions for high performance error control flow. But if it's ultimately connected to JS control flow, I think it's fine to use exceptions.
It appears that after committing the transaction, attempting to release the snapshot causes a segfault. I ran gdb over it, but it just steps into the instruction delete casted_s
and results in Cannot access memory at address 0x0
. I suspect this may simply due to the fact that a transaction's snapshot doesn't need to be released if already released through committing/rollbacking the transaction.
Now if I don't release the snapshot, the act of closing the database causes it to attempt to release the snapshot. So I suspect, perhaps if we commit/rollback the transaction we must not attempt to release that snapshot.
Actually it turns out, that if you were to release the snapshot earlier, this causes a bug in the transaction when it later attempts to close.
I think you never manually release the snapshot if the snapshot was created from the transaction. Because then the snapshot is part of the lifecycle of the transaction. Hence why it's called Transaction::GetSnapshot
.
All the example code shows just setting the snapshot pointer to nullptr
after the transaction is completed.
Anyway to solve the problem, we just have to ensure not to release the snapshot early. Or if we are releasing the snapshot early, we would then have to also probably clear the snapshot from the transaction before committing or rollbacking it.
So we should make snapshotRelease
a dbSnapshotRelease
instead, since it cannot be applied to snapshots acquired from a transaction.
If it is applied, we should add a function that throws an error as part of the release. The transaction snapshot is only applicable to the read calls. But any snapshot that is no longer relevant should be switched with a boolean, to avoid callers from re-using the snapshot.
You have to set the snapshot before getting it, if you get it before on the transaction, it will give back a null pointer.
We now have RocksDBTransactionSnapshot
and RocksDBSnapshot
2 separate types, the former only for transaction snapshots, the latter exposing database snapshots. Don't mix them up!
Derived from #18
Description
This implements the snapshot isolation DB transaction.
This means DBTransaction will be automatically snapshot isolated, which means most locking will be unnecessary.
Instead when performing a transaction, there's a chance for a
ErrorDBTransactionConflict
exception which means there was a write conflict with another transaction.Users can then decide on their discretion to retry the operation if they need to (assuming any non-DB side-effects are idempotent, noops or can be compensated). This should reduce the amount of locking overhead we need to do in Polykey. We may bubble up the conflict exception to the user, so the user can re-run their command, or in some cases, in-code we will automatically perform a retry. The user in this case can be the PK client, or the another PK agent or the PK GUI.
There is still one situation where user/application locks are needed, and that's where there may be a write-skew. See snapshot isolation https://en.wikipedia.org/wiki/Snapshot_isolation for more details and also https://www.cockroachlabs.com/blog/what-write-skew-looks-like/.
In the future we may upgrade to SSI (serializable snapshot isolation) which will eliminate this write-skew possibility.
~Additionally this PR will also enable the
keyAsBuffer
andvalueAsBuffer
options on the iterators, enabling easier usage of the iterators without having to usedbUtils.deserialize<T>(value)
where it can be configured ahead of time.~ - already mergedSee this https://www.fluentcpp.com/2019/08/30/how-to-disable-a-warning-in-cpp/ as to how to disable warnings in C++ cross platform.
Also see: https://nodejs.github.io/node-addon-examples/special-topics/context-awareness/
Issues Fixed
Tasks
keyAsBuffer
andvalueAsBuffer
~ - already merged in stagingDBTransaction
lifecyclesnapshotLock
to ensure mutual exclusion when using the snapshot iteratorgetSnapshot
as the last-resort getter forDBTransaction.get
DBTransaction.iterator
to use the snapshot iteratorkeyAsBuffer
andvalueAsBuffer
usage on the iterator, expectstring
andV
types. Existing tests should still work. Since by default iterator returns buffers. This is different fromget
which by default does not return raw buffers.~ - already merged into stagingErrorDBTransactionConflict
sync
option when committingDBTransaction
by default, allow it to be set offdb_destroy
can replace our removal of the db path~ - no need for this, because we may have more interesting state in the db pathCORRUPTION
error code and attempt automatic repair and reopen, or at least provide users the ability to call the command with a special exception for this - https://github.com/MatrixAI/js-db/pull/19#issuecomment-1143114335~ - To be addressed later in #35LevelDB
LevelDBP
DB.ts
directlyDBIterator
to maintain async lifecycle of the iterator - https://github.com/MatrixAI/js-db/pull/19#issuecomment-1145852220 (it was a good thing we caught this during benchmarking, a serious performance regression that led us to discover a resource leak)ErrorDBLiveReference
to mean that an iterator or transaction object is still alive whendb.stop()
is called, and therefore it must prevent any stopping, we must there maintain a weakset for every transaction/iterator objects that we create. This can also be used for subdatabases, where you create a prefixed DB in the future.~WeakSet
does not support size or length, there's no way to know how many of these objects are still aliveDB
and subtract an allocation counter instead, or just maintain reference to objects and remove themSet
then, anddestroy
removes them from the setErrorDBLiveReference
, just auto-closes the same as in C++transaction_*
native functions:transactionInit
transactionCommit
transactionRollback
transactionGet
transationGetForUpdate
transactionPut
transactionDel
transactionMultiGet
transactionClear
transactionIteratorInit
transactionSnapshot
transactionMultiGetForUpdate
GetForUpdate
, this "materializes the conflict" in a write skew so that it becomes a write write conflict. Although rocksdb calls this a read write conflcit. SSI is not provided by rocksdb, it is however available in cockroachdb and badgerdb, but hopefully someone backports that to rocksdb.DBIterator
doesn't seem to work when no level path is specified, it iterates over no records at all, it's possible that our iterator options don't make sense when thedata
sublevel isn't used.levelPath
andoptions
parameters, becauselevelPath
is a far more used thanoptions
, this does imply an API break for EFS... etc, but it should be a quick find and replaceFinal checklist