basho / bitcask

because you need another a key/value storage engine
1.28k stars 171 forks source link

bitcask:merge/1 keeps file descriptors of deleted files open [JIRA: RIAK-2814] #249

Closed janskrz closed 7 years ago

janskrz commented 7 years ago

I am using bitcask without riak with bitcasks default configuration. I noticed that file descriptors of deleted files are not closed whenever I call bitcask:merge/1. I have tested this for Erlang OTP 18.1 and 17.5.

The following snipped fills bitcask with random binaries and calls merge:

Dir = "/run/shm/bitcask".
Cask = bitcask:open(Dir, [read_write]).
RandSize = fun() -> random:uniform(1020*1024) + (4 * 1024) end.
RandBin = fun() -> crypto:rand_bytes(RandSize()) end.
Fill = fun(From, To) -> [bitcask:put(Cask, integer_to_binary(X), RandBin()) || X <- lists:seq(From, To)] end.
[Fill(1, 3000) || _X <- lists:seq(1, 3)].
bitcask:merge(Dir).

lsof output after running the snipped:

bzcskrzy@cumu02-15:/run/shm/bitcask$ lsof -c beam | grep /run/shm
beam.smp 21255 bzcskrzy    8u   REG   0,21         38    119191 /run/shm/bitcask/bitcask.write.lock
beam.smp 21255 bzcskrzy   10u   REG   0,21 2147352145    115060 /run/shm/bitcask/1.bitcask.data (deleted)
beam.smp 21255 bzcskrzy   11u   REG   0,21 2146712969    119194 /run/shm/bitcask/2.bitcask.data (deleted)
beam.smp 21255 bzcskrzy   12u   REG   0,21  424816644    117958 /run/shm/bitcask/3.bitcask.data
beam.smp 21255 bzcskrzy   13u   REG   0,21      35596    119291 /run/shm/bitcask/3.bitcask.hint  

Repeatedly filling and merging bitcask will accumulate more and more such FDs, which are not released until bitcask:close/1 is called.

UPDATE: It seems like I can work around this problem by manually retrieving the open files from bitcasks bc_state process dictionary and calling bitcask_fileops:close_all/1.

8> State = erlang:get(Cask).
{bc_state,"/run/shm/bitcask",
          {filestate,read_write,"/run/shm/bitcask/3.bitcask.data",3,
           <0.52.0>,<0.53.0>,3626079127,424816644,423940347,22,
                     2058954717},
          <<>>,
          [{filestate,read_only,"/run/shm/bitcask/2.bitcask.data",2,
                      <0.50.0>,undefined,0,2146712969,2145783270,22,1423779256},
           {filestate,read_only,"/run/shm/bitcask/1.bitcask.data",1,
                      <0.48.0>,undefined,0,2147352145,2146439343,22,2556499941}],
          2147483648,
          [{expiry_secs,-1},read_write],
          #Fun<bitcask.21.19258060>,<<>>,1,2}
9> Files = element(5, State).
[{filestate,read_only,"/run/shm/bitcask/2.bitcask.data",2,
            <0.50.0>,undefined,0,2146712969,2145783270,22,1423779256},
 {filestate,read_only,"/run/shm/bitcask/1.bitcask.data",1,
            <0.48.0>,undefined,0,2147352145,2146439343,22,2556499941}]
10> bitcask_fileops:close_all(Files).
ok

UPDATE 2: I just noticed that bitcask:needs_merge/2 already closes file descriptors that were left open from a previous merge. This fixes my problem without relying on stupid hacks. But maybe this should be documented somewhere in case someone runs into the same problem.

Thanks, Jan

Basho-JIRA commented 7 years ago

This is fixed by also fixing https://github.com/basho/bitcask/issues/251. We need to bring NIF/Erlang mode into lock step with options passed into the open call.

[posted via JIRA by Brian Sparrow]

Basho-JIRA commented 7 years ago

Addressed by adding O_CREAT flag to open_file in NIF mode and educating CSE's on the needs_merge logic which cleans up left over file descriptors.

[posted via JIRA by Brian Sparrow]