Open binarytemple opened 9 years ago
This failure is happens in this line of bitcask.erl because we fail to close a bitcask file. In 1.4.1, this line looks like this:
ok = file:close(Fd),
The ok
was modified to _
by @slfritchie and @engelsanchez in this commit with the following commit message:
Buildbot isn't quite right in the head: I've no idea why I can run this test 3000x without failure, but Buildbot fails
bitcask_qc: merge2_test... =ERROR REPORT==== 5-May-2014::04:56:18 === ** Generic server <0.2324.0> terminating ** Last message in was {'DOWN',#Ref<0.0.0.50927>,process,<0.2317.0>,normal} ** When Server state == {state,undefined,undefined} ** Reason for termination == ** {{badmatch,{error,badarg}}, [{bitcask_file,handle_info,2,[{file,"src/bitcask_file.erl"},{line,193}]}, {gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,604}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
Fix: Really, we're stopping, so change the pattern match to '_' so we don't care.
@slfritchie, @engelsanchez, do you think that if this file:close
returns {error, badarg}
in production, that is a problem? Thank you.
Sounds like the file has already been explicitly closed before getting the 'DOWN' message. So this is called: https://github.com/basho/bitcask/blob/1b5ad684c1845f1cf7e8dd4f6fa5fe45e5baf3be/src/bitcask_file.erl#L150
And then we try to close the same already closed file descriptor and get a badarg. It seem it would be more robust to change the fd to undefined after closing it, and checking for undefined in the file:close for the 'DOWN' message.
Thank you, I'm happy with closing the ticket then.
Leave it open @hcs42. It's a small race worth fixing.
This is a 1.4.1 (bitcask 1.6.3) deployment. Seeing repeated crashes during merge operations. Couldn't find any similar existing issues. No expectation of fix in this version, but creating for identification purposes/in case it exists in 1.7.