Closed nazarhussain closed 3 years ago
This is not something we can realistically handle. You can only safely delete the directory after closing the db in JS. An open db has open file handles, and deleting the directory is gonna lead to undefined behavior. Which is what you are seeing.
As for stopping when the error happens, that is the application's responsibility because it has an unhandled promise rejection. Either add a try/catch or run node with --unhandled-rejections=throw
.
@vweevers If that's deleting data directory leads to undefined behavior a.k.a throwing exceptions, crashing applications then I would say its the intended behavior. If some action up to that extent happens that means none of read or write operation should work and it must throw errors.
But that's not what happens. I am just debugging another application using rocksdb
with levelup
wrapper. After deleting the data directory the application is still running for over a few hours without crashing with the error or showing any warning message. The first key which was added to DB is still accessible can be served by the API, which tends to make me feel that when the data files are removed on the runtime every read/write operation is been migrated to in-memory. Which should never be the case.
I will bring more information as I found in my debugging.
and it must throw errors
That would be a defined behavior. Undefined behavior means that it might throw, it might not. Anything can happen.
Let me put it this way: deleting the directory breaks a reasonable expectation in RocksDB that no external modifications will be made while the db is open. In any case, the IO is handled in RocksDB. There's nothing we can do here.
You may be able to detect a deletion by watching the filesystem. That would be out of scope for rocksdb
. Good luck!
@vweevers I do agree watching file system is not the scope of rocksdb
.
But one thing is very clear rocksdb
is not a memory
data store, it's a file-based key-value store. And persisting data to the files is the core responsibility of rocksdb
. If for any reason it can't persist any write operation it should notify the user.
To my understanding problem is far deeper and dangerous. Deleting the data directory is just one use case where rocksdb
can't persist data and it's shadowing this failure from the user. I suspect some other use cases could be possible where disk operation fails or application crashes and rocksdb
could not persist data but keep shadowing those failures.
The whole scenario summarized as rocksdb
doesn't provide a write operation guarantee, exploring the documentation of RocksDB itself I didn't found any such reference, so I tend to think it's some underlying issue in the binding.
I just verified one more thing for the application using rocksdb
with levelup
. I just wrote around 30MB with a max 15kb of each write operation. And never throws any single error and kept all data in-memory because the data file actually was removed as soon the connection was open. So it's not the issue with some write buffer, rather the behavior of RocksDB of switching file-based persistence to in-memory persistence. Is it the default behavior of RocksDB? If yes would you share some reference for it?
For reference of the people who came across this issue. I found that the behaviour which been discussed here is native to RocksDB
itself and not the issue in the binding. Used following code to test it.
If we run this native code and during it we delete the data directory, it never complaints and never crash, instead silently switch over to in-memory store.
#include <cstdio>
#include <iostream>
#include <string>
#include <unistd.h>
#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"
using namespace ROCKSDB_NAMESPACE;
using namespace std;
std::string kDBPath = "rocksdb_simple_example.db";
int main() {
DB* db;
Options options;
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
options.IncreaseParallelism();
options.OptimizeLevelStyleCompaction();
// create the DB if it's not already present
options.create_if_missing = true;
int counter = 0;
// open DB
Status s = DB::Open(options, kDBPath, &db);
assert(s.ok());
while(counter < 50) {
// Put key-value
string key = "key" + to_string(counter);
s = db->Put(WriteOptions(), key, "value");
assert(s.ok());
std::string value;
// get value
s = db->Get(ReadOptions(), key, &value);
assert(s.ok());
assert(value == "value");
usleep(300 * 1000);
counter = counter + 1;
std::cout << key << " - matched...\n";
}
delete db;
return 0;
}
@vweevers I found the actual reason behind this behaviour. May be interested for people following this thread.
An open file is deleted only when all file descriptors are closed
In addition to maintaining a link count for each i-node, the kernel also counts open file descriptions for the file (see Figure 5-2, on page 95). If the last link to a file is removed and any processes hold open descriptors referring to the file, the file won’t actually be deleted until all of the descriptors are closed.
The Linux Programming Interface by Michael KerrisK on Page 346
This Linux behavior was causing RocksDB to keep operating while the data directory was deleted by some other process.
If the persisted database directory is accidentally deleted then read/write operations keep operating except the first call which makes the the buffer full set by
writeBufferSize
.This is a script which try some operations on a regular interval. The utility functions used are mentioned at the end of the issue.
If we execute the above code it shows output like this.
In meanwhile if we delete the actual persisted database.
Then we see the following output.
There are inherent two problems:
put
keep writing data without failure till some threshold I believewriteBufferSize
which is4MB
by default. Problem is application which is running is not aware the state is not persisted.In contrast to
leveldb
if similar situation happens, every read and write operation fail with error. Also tried thelevelup
wrapper and have similar behaviour.This is a rear scenario, but I believe the binding or database should be able to detect non-persisted state and specially throw every read/write operation which is not been possible afterwards. Adding such logic to applications using
rocksdb
would be overwhelming and redundant.Here is code which promisify the read/write operations.