facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.66k stars 6.33k forks source link

Iterator crashes JVM on closed database (RocksJava) #5234

Open aecio opened 5 years ago

aecio commented 5 years ago

This bug report is for RocksJava JNI wrapper (any recent version).

After closing the database object, if we try to read an iterator associated with the closed database, the JVM crashes. I'm not sure this is by design or the expected behavior would be a null or something like an IllegalStateException. In my application, I call the iterator's isValid() method right before reading the iterator, but the JVM crash still happens because of a race condition where the database is closed in another thread.

Expected behavior

Throw an IllegalStateException? Return null? Or users are expected to always synchronize all iterator accesses on the RocksDB database object?

Actual behavior

JVM crashes:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f94c174b979, pid=23521, tid=0x00007f956e988700
#
# JRE version: OpenJDK Runtime Environment (8.0_191-b12) (build 1.8.0_191-8u191-b12-2ubuntu0.16.04.1-b12)
# Java VM: OpenJDK 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [librocksdbjni4452423069810647234.so+0x374979]  rocksdb::MemTableIterator::value() const+0x19
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /home/username/workspace/myapp/hs_err_pid23521.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Steps to reproduce the behavior

The following test case crashes the JVM.

    static {
        RocksDB.loadLibrary();
    }

    @Rule
    public TemporaryFolder tempFolder = new TemporaryFolder();

    @Test
    public void shouldCrashJavaRocks() throws IOException, RocksDBException {
        String path = tempFolder.newFolder().toString();
        Options options = new Options();
        options.setCreateIfMissing(true);
        RocksDB db = RocksDB.open(options, path);
        byte[] key = {0x1};
        byte[] value = {0x2};
        db.put(key, value);

        RocksIterator it = db.newIterator();
        db.close();

        assertFalse(it.isValid());

        it.value(); // crashes the JVM
    }
edwinchoi commented 5 years ago

+1 I really wish the Java API placed more emphasis on detecting unstable states and making them observable from the JVM rather than relying on the system to abend the process when it tries to do something that isn't kosher. In your case, if you enable assertions you should get an AssertionError, but that's because the disposal of the handle was done in Java. If an object becomes unreachable and is finalized while still live in the native layer, your application will crash without any clear reason as to why.

I don't think it's unreasonable for a parent to have to keep track of its child resources, and doing so would lead to less surprises and a more flat learning curve. There is a non-trivial overhead with the JNI library and the amount of copying it does as is; so adding an "if" statement, where the condition is relatively constant (and predictable after a few runs), would not add that much overhead... the cost of a couple potential branch mispredictions pales in comparison to cache misses that are inevitable under a randomized workload, and, more importantly, with time lost by the individuals debugging the segfaults, in all their opaque glory.

An exception to this is when ownership of a resource is explicitly transferred. Both sides have to agree on it though... a case where they do not agree- you can pass a Slice to ReadOptions and ReadOptions will retain the slice... but when you use it to create an iterator in a separate lexical scope, it may be finalized and segfault the process.

adamretter commented 4 years ago

The book-keeping required would not be insignificant. We would need to add an atomic check to every function call.

edwinchoi commented 4 years ago

Are you concerned with the number of changes or the potential cost of the volatile reads?

For the former, you could easily do this by refactoring through IntelliJ. You can encapsulate access to the nativeHandle_ field through a method, and verify the pointer hasn’t been disposed in its body. It wouldn’t prevent abend alone, you would need a reader-writer semaphore with nested lifecycle management for that. I imagine this wouldn’t be too difficult to add through a SharedMonitor that exposes enterShared/exitShared, enterClose/exitClose, and createChild methods; and a Monitor that exposes enter/exit methods that is also the return type for createChild. The semantics for enter/exitClose methods would be similar to an exclusive lock, with the caveat that they are only called once and after their invocation any subsequent attempts to acquire the shared lock would fail with an exception.

Since the majority of access is shared, you could optimize for this case by using some form of a spin lock, with a yielding/sleeping wait strategy for the exclusive lock. And if there’s contention on the readers, you can stripe the counters and separate “acquires” from “releases” (exclusive lock waits for their sums to be equal). But that should come after trying out ReentrantReadWriteLock and measuring the overhead with common operations. You can use IntelliJ’s structured replace to wrap the method bodies with the enter/exit calls.

This won’t address cases where an object allocated in a different lexical scope becomes unreachable and gets GC’d, but I would argue that’s out of scope for synchronizing the life cycle of RocksDB with RocksIterator, which is the most common pain point.

For the latter (performance), uncontended atomic reads on modern x64 should be nothing more than a read on L1. You can also disable the checks altogether with a system property on a static final field, and create noop implementations of the Monitor. Oracle’s C2 compiler (server) will recognize and eliminate the dead code. For client and other vendor VMs and other CPU architectures, I have no immediate insight. But I expect the overhead would pale in comparison to the costs incurred by the JNI impl - GetByteArrayElements, for example, does a “malloc” for non empty arrays and copies the memory.

Barring the adoption of these changes, there are some simpler changes that would make it feasible for someone to extend your classes to create a safe implementation. Specifically by adding factory classes/methods for creating the RocksDB and RocksIterator instances, or adding a method to transfer ownership of a handle from one object to another.

adamretter commented 3 years ago

the potential cost of the volatile reads?

I am concerned about the overhead of doing such a check on effectively every operation that RocksDB does. Whilst I understand that for one operation the overhead is low, once you scale this up to millions of operations, I doubt that it is still insignificant.

edwinchoi commented 3 years ago

@adamretter

I am concerned about the overhead of doing such a check on effectively every operation that RocksDB does. Whilst I understand that for one operation the overhead is low, once you scale this up to millions of operations, I doubt that it is still insignificant. volatile doesn't always go to main memory, it's just like resolving any other address... L1 -> L2 -> L3.

You can scale it up to billions and the overhead would still be trivial. Processes will not contend on reads when there are no writers (memory-wise) on amd64/x86-64 architectures (Intel SDM confirms, as does running with -XX:+PrintAssembly). The cost of doing a volatile read in Java on amd64 is the same as doing a mov reg, mem. You could potentially have contention on the cache-line if multiple objects are in the same 64-byte stride. However, only stores will lock the bus (Java doesn't use fences). So long as the object is in L1, the cost would be negligible compared to the overhead of making the JNI call, and the cost of the JNI impl.

Also, using AtomicBoolean adds a level of indirection. Using AtomicIntegerFieldUpdate would eliminate the need to dereference the object. I would suggest taking that route instead.

class AbstractImmutableNativeReference {
  private static final AtomicIntegerFieldUpdater<AbstractImmutableNativeReference> UPDATER =
    AtomicIntegerFieldUpdater.newUpdater(AbstractImmutableNativeReference.class, "owningHandle_");

  private volatile int owningHandle_;

  protected void finalize() {
    if (owningHandle_ == 1) {
      // log a warning
    }
  }

  public boolean isOwningHandle() { return owningHandle_ != 0; }

  protected final void disOwnNativeHandle() {
    if (!OWNING_HANDLE.compareAndSet(this, 1, 0)) {
      throw new IllegalStateException("Cannot transfer ownership of something you don't own.");
    }
  }

  public final void close() {
    if (OWNING_HANDLE.compareAndSet(this, 1, 0)) {
      disposeInternal();
    }
  }
}

Here, only disOwnHandle and close will do a lock cmpxchg, the bus lock is where contention would come into play. Though I find it questionable to name a mutable object as being immutable, and for an object that doesn't have any claim over an object to be able to access the pointer. Passing a guarded proxy object would make more sense.

In C++, when you delete a pointer (and the reference exists outside the lexical scope), you would typically assign nullptr to it. Yet the same care and safety guards to make sure that your references are valid are not present in the Java code.

As I noted before, you can always use a static final boolean flag to control whether or not you do strict checks... the C2 compiler will optimize away the checks. That really should be the default behavior though. Users can disable the checks as an optimization, but having this as the default would be like disabling a safety feature that could prevent a kernel panic, but only by opt-in.

I'm unfamiliar with ARM or with Android's VM, but I would find it extremely surprising if a single check is more expensive than the JNI overhead (copying to conform to C calling conventions, and adding a safepoint poll after the call), and the actual work done in the method. Searching a memory table acquires a spinlock, which falls back to yielding to the scheduler... this can't possibly be more expensive than a read from memory. If L1 caches become more prevalent, it could be the object layout.