Closed paberr closed 6 years ago
An interesting observation is that adding/removing one byte to the Uint8Arrays (i.e., varying the length of it) causes the snippet to work properly on our test machine.
@paberr Why are you using an Uint8Array
with putString
? If you want to work with binary data, I would suggest the node Buffer
instead.
We use, since we use the same application code for node and browser.
However, the program also crashes when replacing the Uint8Array
by a Buffer
.
I cannot reproduce the crash. To me it just outputs "All done" and exits normally. I use node v8.9.4 on Fedora 27. Which version of node do you use with what OS, and are you sure you use the latest node-lmdb?
I just noticed you use the Uint8Array
as a key. This has never been tested. Especially since in your configuration, it expects a string key. If you want to use a binary key, you should specify keyIsBuffer: true
to env.openDbi
and use a node Buffer
. The fact that it doesn't throw an error for your input is a bug in itself.
So, I investigated a little. Looks like node thinks that Uint8Array
is a buffer, actually node::Buffer::HasInstance
returns true if given the Uint8Array
and then node::Buffer::Data
+ node::Buffer::Length
work with it correctly too.
Looking further into the code, it is not necessary anymore to specify the key type to openDbi
in this case it will infer the key type for you.
Sorry for the noise. So far I have not been able to reproduce the problem. Hopefully knowing the OS and node version will help.
It would also help to see the stack trace from a debug build, ie. node-gyp configure --debug
.
I was able to do some tests with the native code. The problem seems to occur only when compiling with gcc
and -O3
flag (-O2
or clang
are fine). gcc
and -O3
seem to be the default for release build on Linux systems and setting -O2
in binding.gyp
will just be ignored. Debug builds on Linux default to use -O0
(even if -O3
is set in binding.gyp
), so just enabling debug will cause the bug to no longer be visible.
I manually did a debug build with -O3
and -g
, this is the resulting stack trace:
Thread 1 "node" received signal SIGSEGV, Segmentation fault.
mdb_cursor_put (mc=<optimized out>, key=0x7fffffffc4e0, data=0x7fffffffc4f0, flags=<optimized out>) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:7652
7652 mp->mp_ptrs[i] = fp->mp_ptrs[i] + offset;
(gdb) bt
#0 mdb_cursor_put (mc=<optimized out>, key=0x7fffffffc4e0, data=0x7fffffffc4f0, flags=<optimized out>) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:7652
#1 0x00007ffff48b88d6 in mdb_put (txn=0x228c040, dbi=2, key=key@entry=0x7fffffffc4e0, data=data@entry=0x7fffffffc4f0, flags=0) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:9847
#2 0x00007ffff48c0296 in TxnWrap::putCommon (info=..., fillFunc=0x7ffff48bf190 <TxnWrap::<lambda(Nan::NAN_METHOD_ARGS_TYPE, MDB_val&)>::_FUN(Nan::NAN_METHOD_ARGS_TYPE, MDB_val &)>,
freeData=0x7ffff48befc0 <TxnWrap::<lambda(MDB_val&)>::_FUN(MDB_val &)>) at ../src/txn.cpp:278
#3 0x00007ffff48b9afc in Nan::imp::FunctionCallbackWrapper (info=...) at ../../nan/nan_callbacks_12_inl.h:174
#4 0x0000000000a9df83 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#5 0x0000000000b14fac in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#6 0x0000000000b15bff in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
I guess this could be a bug in GCC or lmdb itself and not the bindings. I use GCC 7.3.1 as provided in Fedora 27.
Using a modified binding.gyp to remove the default -O3
and use -O2
instead makes everything run smooth again. The issue also does not occur with a legacy GCC 3.4.6.
"cflags": [
"-fPIC",
"-O2"
],
"cflags!": [
"-O3"
],
Thank you @mar-v-in ― yes, I can reproduce the problem with -O3
. It's a bit scary as it comes from lmdb, from a part where basically everything is optimized out. When I touch the code, it is no longer optimized out, and the problem is no longer present.
Would be interesting to see if the same issue can be reproduced in C. If yes, that means this is an lmdb bug. Otherwise something with node-lmdb.
While the investigation is ongoing, I agree to drop the -O3
flag.
@mar-v-in Can you confirm that this C snippet reproduces the same crash, if using an -O3
compiled lmdb: https://pastebin.com/DeQ38WfC
I asked around in #openldap and they pointed me to an ITS which was fixed just a few days ago. So I just updated the LMDB dependency and the problem is gone.
@paberr @mar-v-in Can you guys confirm that the issue has been fixed?
Yes, it seems to be fixed now.
Awesome, thank you. I'm closing this issue now. :)
@mar-v-in and I found another – hard to reproduce – SIGSEGV. On several Linux machines, the snippet below leads to a SIGSEGV. However, it works on my MacBook without any problems.
It seems like the length of the buffers and the number of put operations both influence the result.
Here is also the stack trace:
Snippet: