basho / bitcask

because you need another a key/value storage engine
1.29k stars 173 forks source link

Obtain Bitcask write locks using system call `flock/2` #257

Closed andytill closed 2 years ago

andytill commented 6 years ago

Previous to this change, the OS pid was used to lock the bitcask database and prevent two bitcask instances performing write operations at the same time. If the pid in the lock file is currently taken by an OS process then the bitcask DB may not be written to. Bitcask is not able to clean up the write locks because of a hard shutdown it may not be able to delete the "stale" lock file and obtain a new one for the following reasons:

This has been raised in issues #72, #226 and basho/riak#535.

This change uses the flock system call to obtain exclusive write access to the file across OS processes, as long as the other processes use flock to obtain the lock.

The change maintains the previous behaviour where the DB can be opened even if a write lock cannot be obtained, but returns an error on write operations.

The O_EXCL flag has been removed from the call to open in the nifs because if the lock file still exists, it returns the eexist error but now the proof of the write lock is not the file existing but the call to flock.

russelldb commented 6 years ago

Is this something for which it is even possible to create a Riak test for?

russelldb commented 6 years ago

Oh yeah, and if you do merge it, please close the related issues.

ghost commented 6 years ago

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

andytill commented 6 years ago

Sorry no, it's all linked in our internal riak_ee.

Is there a Riak top level project branch somewhere which is already configured to use your branch @andytill ? The code looks +1 but , just to be sure, I would like to run it against some data if possible.

Licenser commented 6 years ago

Might it be worth considering fnctl instead of flock for portabilities sake?

https://www.perkin.org.uk/posts/solaris-portability-flock.html

is a good summary of the topic.

russelldb commented 6 years ago

Thanks @Licenser. You linked to https://github.com/basho/leveldb/blob/44cb7cbc85590280c9a73856470d5880f4015927/util/env_posix.cc#L493 in slack, I'm just commenting here to bring that link into this discussion

andytill commented 6 years ago

Very difficult I think to reproduce the OS pid on the restarted Riak to provoke the original bug. With the change it is possible to trigger the locked error by starting starting Riak, perform one write, starting a second Riak and performing a write with that which will return an error.

Is this something for which it is even possible to create a Riak test for?

Licenser commented 6 years ago

Oh yes, sorry @russelldb should have added that too.

russelldb commented 6 years ago

I’m happy without a riak_test, if there is no test to show the fault pre-fix.

On 6 Mar 2018, at 10:58, Andy Till notifications@github.com wrote:

Very difficult I think to reproduce the OS pid on the restarted Riak to provoke the original bug. With the change it is possible to trigger the locked error by starting starting Riak, perform one write, starting a second Riak and performing a write with that which will return an error.

Is this something for which it is even possible to create a Riak test for?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Vagabond commented 6 years ago

I'd definitely echo the suggestion to look at fnctl, the link Heinz posted even has some compatibility shims in addition to the eleveldb one.

ghost commented 6 years ago

On the CI server (my laptop) the quickcheck tests fail - output attached.

bitcast-test-failure.txt

It would be brilliant if we could get the tests to pass - then +1 from me.

With the current implementation I caused data loss (last year) when opening a bitcask data directory twice, despite the the second open was flagged as read_only (via a second beam) any attention to this area feels useful.

Imho - fnctl is only a consideration for Solaris users - @heinz is not using Bitcask on Solaris, which means there are zero stakeholders I know of for whom this would cause direct inconvenience.

ghost commented 6 years ago

wrt above comment - emfile - I've increased file descriptors and rerun the quickcheck tests.

russelldb commented 6 years ago

@bryanhuntesl that failure is emfile, are you sure you have your ulimit high enough?

ghost commented 6 years ago

Increased emfile limit to 10000 tests still failing - output attached.

bitcask-tests-failure.txt

russelldb commented 6 years ago

@bryanhuntesl can you add some info (erl + eqc version, platform) I ran this 10 times today without failure, on my macbook

ghost commented 6 years ago

Certainly - Macbook - i7/16GB

Erlang R16B02-basho1 (erts-5.10.3) [source] [64-bit] [smp:8:8] [async-threads:10] [hipe] [kernel-poll:false]

▷  echo $ERL_LIBS (re-formatted)
:/Users/bryanhunt/quickcheck-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/eqc-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse-1.36.1:
/Users/bryanhunt/quickcheck-1.36.1/pulse_otp-1.36.1
russelldb commented 6 years ago

Not gonna moan about basho1 (should be basho10). Interesting. All the same as me. Ah well. Good luck someone with that one!

ghost commented 6 years ago

Just to confirm - I am running 16b02-basho10 - seems like it's been truncated in the erl console output.

Licenser commented 6 years ago

I am using bitcask on SmartOS. But that aside I see little reason not to use a POSIX compliant function when there is one.

ghost commented 6 years ago

Sorry, my mistake, I thought you were doing everything with raw files/Postgres.

Licenser commented 6 years ago

So many systems ;) in dalmatinerDB yes, it's files/postgres/(leveldb sadly that's tied to _core), snarl my AAA system uses bitcask for the timeouts. I'm happy to make a pullrequest on the pullrequest to move it to fnctl, it seems just to be once place that needs changes.

ghost commented 6 years ago

Thanks for clarification @Licenser

andytill commented 6 years ago

@Licenser thanks for offering to make this Solaris compatible, I'd welcome this change to the PR.

Licenser commented 6 years ago

The lock test does seem to fail on OSX (unmodified code):

    bitcask_lockops_tests: lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test...*failed*
in function bitcask_lockops_tests:'-lock_cannot_be_obtained_on_already_locked_file_within_same_os_process_test/0-fun-0-'/1 (test/bitcask_lockops_tests.erl, line 34)
**error:{assertEqual,[{module,bitcask_lockops_tests},
              {line,34},
              {expression,"bitcask_lockops : acquire ( write , Dir )"},
              {expected,{error,locked}},
              {value,{ok,<<>>}}]}
  output:<<"">>

update 1

Odd this did work on the second run, not sure as to why.

update 2

The eqc tests segfault :(

  bitcask_nifs: keydir_get_put_test_...........................................................................................make: *** [eunit_nif] Segmentation fault: 11

update 3

the calls tack would be:

Process 22623 stopped
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
   321
   322      if (env != NULL) {
   323      c_p = env->proc;
-> 324      if (receiver == c_p->common.id) {
   325          rp_locks = ERTS_PROC_LOCK_MAIN;
   326          flush_me = 1;
   327      }
Target 0: (beam.smp) stopped.
(lldb) bt
* thread #16, name = '2_scheduler', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000000147dbea8 beam.smp`enif_send(env=0x00000000b06ba868, to_pid=<unavailable>, msg_env=0x0000000015f40818, msg=566859) at erl_nif.c:324 [opt]
    frame #1: 0x0000000014aea28a bitcask.so`msg_pending_awaken(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10, msg=566859) at bitcask_nifs.c:2752
    frame #2: 0x0000000014ae5c4f bitcask.so`merge_pending_entries(env=0x00000000b06ba868, keydir=0x00007fa0dbf0ba10) at bitcask_nifs.c:2826
    frame #3: 0x0000000014ae5abd bitcask.so`itr_release_internal(env=0x00000000b06ba868, handle=0x0000000015ac1850) at bitcask_nifs.c:1995
    frame #4: 0x0000000014ae6449 bitcask.so`bitcask_nifs_keydir_resource_cleanup(env=0x00000000b06ba868, arg=0x0000000015ac1850) at bitcask_nifs.c:2920
    frame #5: 0x00000000147de2d2 beam.smp`nif_resource_dtor(bin=<unavailable>) at erl_nif.c:1420 [opt]
    frame #6: 0x00000000147b2173 beam.smp`sweep_off_heap [inlined] erts_bin_free(bp=0x0000000015ac1828) at erl_binary.h:331 [opt]
    frame #7: 0x00000000147b2165 beam.smp`sweep_off_heap(p=0x00000000161824b0, fullsweep=0) at erl_gc.c:2371 [opt]
    frame #8: 0x00000000147b04ec beam.smp`erts_garbage_collect [inlined] do_minor(p=<unavailable>, new_sz=<unavailable>, objv=<unavailable>, nobj=<unavailable>) at erl_gc.c:1180 [opt]
    frame #9: 0x00000000147af16f beam.smp`erts_garbage_collect at erl_gc.c:880 [opt]
    frame #10: 0x00000000147aeef2 beam.smp`erts_garbage_collect(p=<unavailable>, need=<unavailable>, objv=0x00000000b06bac78, nobj=1) at erl_gc.c:452 [opt]
    frame #11: 0x00000000147ada07 beam.smp`erts_gc_after_bif_call(p=0x00000000161824b0, result=<unavailable>, regs=<unavailable>, arity=<unavailable>) at erl_gc.c:373 [opt]
    frame #12: 0x00000000146b3418 beam.smp`process_main at beam_emu.c:2881 [opt]
    frame #13: 0x00000000147450a1 beam.smp`sched_thread_func(vesdp=0x00000000151782c0) at erl_process.c:8118 [opt]
    frame #14: 0x000000001486ea92 beam.smp`thr_wrapper(vtwd=0x00007ffeeb55ae50) at ethread.c:114 [opt]
    frame #15: 0x00007fff717ab6c1 libsystem_pthread.dylib`_pthread_body + 340
    frame #16: 0x00007fff717ab56d libsystem_pthread.dylib`_pthread_start + 377
    frame #17: 0x00007fff717aac5d libsystem_pthread.dylib`thread_start + 13
andytill commented 6 years ago

Still waiting to get my eqc license approved by work, then I'll try to reproduce.

llelf commented 5 years ago

Withfcntl(…, F_SETLK, …) (the only thing POSIX has), any close would lead to the process losing all its locks to that file.