Level / rocksdb

Pure C++ Node.js RocksDB binding. An abstract-leveldown compliant store.
MIT License
229 stars 53 forks source link

Read and write operations on non-persisted database does not fail #165

Closed nazarhussain closed 3 years ago

nazarhussain commented 3 years ago

If the persisted database directory is accidentally deleted then read/write operations keep operating except the first call which makes the the buffer full set by writeBufferSize.

This is a script which try some operations on a regular interval. The utility functions used are mentioned at the end of the issue.

  1. Generate random key
  2. Generate random value
  3. Add data key/value pair to db
  4. Get data key/value pair and match
  5. Keep repeating these steps
const rocksDB = require("rocksdb");
const { randomBytes } = require("crypto");

const run = async () => {
  const store = rocksDB("./data/my_rocks_db.db");
  await open(store);

setInterval(async () => {
    const data = randomBytes(1024 * 1024); // 1 MB
    const key = randomBytes(20);
    await put(store, key, data);
    const storedData = await get(store, key);
    console.log(data.equals(storedData) ? "matching..." : "not matching...");
  }, 300);
};

run().catch(console.error);

If we execute the above code it shows output like this.

> node rocksdb_scripts.js
matching...
matching...
matching...
matching...
matching...

In meanwhile if we delete the actual persisted database.

rm -r ./data/my_rocks_db.db

Then we see the following output.

matching...
matching...
matching...
matching...
(node:16976) UnhandledPromiseRejectionWarning: Error: IO error: ./data/my_rocks_db.db/000024.log: No such file or directory
matching...
matching...
matching...
matching...
matching...
matching...
matching...
matching...

There are inherent two problems:

  1. The put keep writing data without failure till some threshold I believe writeBufferSize which is 4MB by default. Problem is application which is running is not aware the state is not persisted.
  2. Second and bigger problem is once db realize that db files does not exists, it throws error only once. And afterwards every read/write operation NEVER throw any warning or error.

In contrast to leveldb if similar situation happens, every read and write operation fail with error. Also tried the levelup wrapper and have similar behaviour.

This is a rear scenario, but I believe the binding or database should be able to detect non-persisted state and specially throw every read/write operation which is not been possible afterwards. Adding such logic to applications using rocksdb would be overwhelming and redundant.


Here is code which promisify the read/write operations.

const open = async (db) =>
  new Promise((resolve, reject) => {
    db.open((error) => {
      if (error) {
        return reject(error);
      }

      return resolve();
    });
  });

const put = async (db, key, data) =>
  new Promise((resolve, reject) => {
    db.put(key, data, (error) => {
      if (error) {
        return reject(error);
      }

      resolve();
    });
  });

const get = async (db, key) =>
  new Promise((resolve, reject) => {
    db.get(key, (error, data) => {
      if (error) {
        return reject(error);
      }

      resolve(data);
    });
  });
vweevers commented 3 years ago

This is not something we can realistically handle. You can only safely delete the directory after closing the db in JS. An open db has open file handles, and deleting the directory is gonna lead to undefined behavior. Which is what you are seeing.

As for stopping when the error happens, that is the application's responsibility because it has an unhandled promise rejection. Either add a try/catch or run node with --unhandled-rejections=throw.

nazarhussain commented 3 years ago

@vweevers If that's deleting data directory leads to undefined behavior a.k.a throwing exceptions, crashing applications then I would say its the intended behavior. If some action up to that extent happens that means none of read or write operation should work and it must throw errors.

But that's not what happens. I am just debugging another application using rocksdb with levelup wrapper. After deleting the data directory the application is still running for over a few hours without crashing with the error or showing any warning message. The first key which was added to DB is still accessible can be served by the API, which tends to make me feel that when the data files are removed on the runtime every read/write operation is been migrated to in-memory. Which should never be the case.

I will bring more information as I found in my debugging.

vweevers commented 3 years ago

and it must throw errors

That would be a defined behavior. Undefined behavior means that it might throw, it might not. Anything can happen.

vweevers commented 3 years ago

Let me put it this way: deleting the directory breaks a reasonable expectation in RocksDB that no external modifications will be made while the db is open. In any case, the IO is handled in RocksDB. There's nothing we can do here.

You may be able to detect a deletion by watching the filesystem. That would be out of scope for rocksdb. Good luck!

nazarhussain commented 3 years ago

@vweevers I do agree watching file system is not the scope of rocksdb.

But one thing is very clear rocksdb is not a memory data store, it's a file-based key-value store. And persisting data to the files is the core responsibility of rocksdb. If for any reason it can't persist any write operation it should notify the user.

To my understanding problem is far deeper and dangerous. Deleting the data directory is just one use case where rocksdb can't persist data and it's shadowing this failure from the user. I suspect some other use cases could be possible where disk operation fails or application crashes and rocksdb could not persist data but keep shadowing those failures.

The whole scenario summarized as rocksdb doesn't provide a write operation guarantee, exploring the documentation of RocksDB itself I didn't found any such reference, so I tend to think it's some underlying issue in the binding.

nazarhussain commented 3 years ago

I just verified one more thing for the application using rocksdb with levelup. I just wrote around 30MB with a max 15kb of each write operation. And never throws any single error and kept all data in-memory because the data file actually was removed as soon the connection was open. So it's not the issue with some write buffer, rather the behavior of RocksDB of switching file-based persistence to in-memory persistence. Is it the default behavior of RocksDB? If yes would you share some reference for it?

nazarhussain commented 3 years ago

For reference of the people who came across this issue. I found that the behaviour which been discussed here is native to RocksDB itself and not the issue in the binding. Used following code to test it.

If we run this native code and during it we delete the data directory, it never complaints and never crash, instead silently switch over to in-memory store.

#include <cstdio>
#include <iostream>
#include <string>
#include <unistd.h>

#include "rocksdb/db.h"
#include "rocksdb/slice.h"
#include "rocksdb/options.h"

using namespace ROCKSDB_NAMESPACE;
using namespace std;

std::string kDBPath = "rocksdb_simple_example.db";

int main() {
  DB* db;
  Options options;
  // Optimize RocksDB. This is the easiest way to get RocksDB to perform well
  options.IncreaseParallelism();
  options.OptimizeLevelStyleCompaction();
  // create the DB if it's not already present
  options.create_if_missing = true;

  int counter = 0; 

  // open DB
  Status s = DB::Open(options, kDBPath, &db);
  assert(s.ok());

  while(counter < 50) {
    // Put key-value
    string key = "key" + to_string(counter);
    s = db->Put(WriteOptions(), key, "value");
    assert(s.ok());
    std::string value;
    // get value
    s = db->Get(ReadOptions(), key, &value);
    assert(s.ok());
    assert(value == "value");
    usleep(300 * 1000);
    counter = counter + 1;
    std::cout << key << " - matched...\n";
  }

  delete db;

  return 0;
}
nazarhussain commented 3 years ago

@vweevers I found the actual reason behind this behaviour. May be interested for people following this thread.

An open file is deleted only when all file descriptors are closed

In addition to maintaining a link count for each i-node, the kernel also counts open file descriptions for the file (see Figure 5-2, on page 95). If the last link to a file is removed and any processes hold open descriptors referring to the file, the file won’t actually be deleted until all of the descriptors are closed.

The Linux Programming Interface by Michael KerrisK on Page 346

This Linux behavior was causing RocksDB to keep operating while the data directory was deleted by some other process.