deephacks / lmdbjni

LMDB for Java
Apache License 2.0
204 stars 28 forks source link

deadlock in org.fusesource.lmdbjni.JNI.mdb_txn_begin? #70

Open jayenashar opened 8 years ago

jayenashar commented 8 years ago

I'm trying to convert from my old db to lmdb, using 8 threads in parallel. There's about 20000 entries, but after about 6000 org.fusesource.lmdbjni.Database#put(byte[], byte[])s, I either hit a SIGSEGV or the threads lock up all with this stack trace:

      at org.fusesource.lmdbjni.JNI.mdb_txn_begin(JNI.java:-1)
      at org.fusesource.lmdbjni.Env.createTransaction(Env.java:453)
      at org.fusesource.lmdbjni.Env.createWriteTransaction(Env.java:411)
      at org.fusesource.lmdbjni.Database.put(Database.java:394)
      at org.fusesource.lmdbjni.Database.put(Database.java:386)

I'm using my fork of 0.4.7-SNAPSHOT. Increasing the map size works, but I would have expected a MDB_MAP_FULL instead of a hang.

krisskross commented 8 years ago

Try one thread and use an external transaction instead, commit when done.

Example:

 try (Transaction tx = env.createWriteTransaction()) {
   db.put(tx, key, val);
   ...
   tx.commit();
 }
krisskross commented 8 years ago

LMDB is single-writer so other writing threads would block anyway.

If your application SIGSEGV you're probably using the API incorrectly. Show me the code and I may be able to help you.

jayenashar commented 8 years ago

Yes, single threaded, it gives an MDB_MAP_FULL. I haven't tried with a single transaction, but I imagine that would not hang/SIGSEGV.

I got trigger happy and deleted my old db, but here's a small testcase with the same symptoms.

  @Test
  public void testStress() {
    Collections.nCopies(8, null).parallelStream().forEach(new Consumer<Object>() {
      @Override
      public void accept(Object ignored) {
        Random random = new Random();
        for (int i = 0; i < 15000; i++) {
          db.put(bytes(Long.toString(random.nextLong())), bytes(Long.toString(random.nextLong())));
        }
      }
    });
  }
krisskross commented 8 years ago

Hmm. Yes I get the same SIGSEGV. The test runs fine with a ExecutorService though.

    ExecutorService service = Executors.newFixedThreadPool(8);
    service.execute(() -> {
      Random random = new Random();
      for (int i = 0; i < 15000; i++) {
        db.put(bytes(Long.toString(random.nextLong())), bytes(Long.toString(random.nextLong())));
      }
    });

MDB_MAP_FULL means the database is full. You can increase the size by calling Env.setMapSize before opening the environment.

krisskross commented 8 years ago

The SIGSEGV happens when the transaction aborts which is strange. A put should always succeed or block-then-succeed when the write-lock is released.

I can also see the hang now sometimes. Sometimes a thread seems hanging in mdb_put and sometimes all threads block at mdb_txn_begin.

And all this only happens with ForkJoin. Weird.

krisskross commented 8 years ago

It may well be a bug in LMDB. But maybe not since it works with an ExecutorService.

jayenashar commented 8 years ago

I'm not sure I understand your ExecutorService example. Doesn't that only execute the loop once and not 8 times concurrently?

Yeah, I got it all working (serial and parallel) with Env.setMapSize. Sorry I didn't mention that earlier.

I haven't observed it hang in mdb_put, but I suppose at this point you've run it more times than I have. Hopefully all three symptoms will have the same fix.

krisskross commented 8 years ago

Sorry my bad about the ExecutorService example.

The problem only seems to manifest when the database is too small. Do you see this as well?

jayenashar commented 8 years ago

env.setMapSize(50, ByteUnit.MEBIBYTES); - OK env.setMapSize(30, ByteUnit.MEBIBYTES); - OK env.setMapSize(20, ByteUnit.MEBIBYTES); - OK env.setMapSize(15, ByteUnit.MEBIBYTES); - OK env.setMapSize(10, ByteUnit.MEBIBYTES); - OK env.setMapSize(5, ByteUnit.MEBIBYTES); - SIGSEGV env.setMapSize(8, ByteUnit.MEBIBYTES); - hang in mdb_txn_begin env.setMapSize(9, ByteUnit.MEBIBYTES); - OK

I thought 10 was the default, so not sure why it has issues without setting the map size.