SIGSEGV in mdb_put - Githubissues

paberr commented 6 years ago

@mar-v-in and I found another – hard to reproduce – SIGSEGV. On several Linux machines, the snippet below leads to a SIGSEGV. However, it works on my MacBook without any problems.

It seems like the length of the buffers and the number of put operations both influence the result.

Here is also the stack trace:

#0  0x00007ffff48b5787 in mdb_cursor_put () from node-lmdb/build/Release/node-lmdb.node
#1  0x00007ffff48b80c5 in mdb_put () from node-lmdb/build/Release/node-lmdb.node
#2  0x00007ffff48bfd53 in TxnWrap::putCommon(Nan::FunctionCallbackInfo<v8::Value> const&, void (*)(Nan::FunctionCallbackInfo<v8::Value> const&, MDB_val&), void (*)(MDB_val&)) ()
   from node-lmdb/build/Release/node-lmdb.node
#3  0x00007ffff48b9536 in Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) () from node-lmdb/build/Release/node-lmdb.node
#4  0x0000000000a9df83 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#5  0x0000000000b14fac in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#6  0x0000000000b15bff in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()

Snippet:

var lmdb, env, transactions, senders, recipients;

lmdb = require('.');
env = new lmdb.Env();
env.open({
    // Path to the environment
    path: "./testdata",
    // Maximum number of databases
    maxDbs: 10,
    mapSize: 1024*1024
});

senders = env.openDbi({
   name: "senders",
   dupSort: true,
   create: true
});

var txn;

txn = env.beginTxn();

txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'LlPQUh9ENHNpvig9p7ugUJ1jzObSF0LEFQcNOWuLkko=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'GFFooLB2kLApGmuvSvVIgpKEbCmxMZ7HvRDc28WiZ5I=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'y/I16IcPxk37nbDFF84DajOxPQMgbap0YSq/iz7Q74Y=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'AGVmOCf0hocp1Xihl1CzLuQkiIzeNisHMgO3md8d7dQ=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'FR6HSTLoM+2Gw36tBtgiUd5CR0pG+AaFdjmXQjQWcvM=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'V/Qeia8lPbmr3aQ2CC+wA0KVHuIZE9IHT3lpIFPUnAA=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), '0maUqm+1+MF5/iWrfKlTKYURKIvtHtjaDzpYVq6Mtaw=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'qCfkmrovtotO55NFzFs1YnLfRaTTLzOceb9LhvGqtAM=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'yOoKo4QllY/g6N0DDEKasMc2Zm7Im0O83bI3ALu89yw=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'dWV7YOZ4X9fFwz2s3RuguweSrFmx9lvtsq03GfyzO1E=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'VeM2CiYHVaQnEdolBalqEGhi2wLfif4zEaKs1n1Bqeo=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), '5FakpDcdAoaBcsNXGSKCieB0vaYemlpu5BCAYGqgEBY=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'GDuI71pNsdqRairk7kv+jPSpc2tFFbOdf8DfMn8CvPY=', {noOverwrite: false});
txn.putString(senders, new Uint8Array([1,107,87,55,67,89,86,52,73,84,79,111,53,82,115,82,71,105,66,56,54,50,110,100,106,100,90,73,61]), 'qEm/awgwuI0qcBASICGKyVVrLvZnrl1XDk6+FKkqBoo=', {noOverwrite: false});

txn.commit();
senders.close();
env.close();
console.log('All done');

paberr commented 6 years ago

An interesting observation is that adding/removing one byte to the Uint8Arrays (i.e., varying the length of it) causes the snippet to work properly on our test machine.

Venemo commented 6 years ago

@paberr Why are you using an Uint8Array with putString? If you want to work with binary data, I would suggest the node Buffer instead.

paberr commented 6 years ago

We use, since we use the same application code for node and browser. However, the program also crashes when replacing the Uint8Array by a Buffer.

Venemo commented 6 years ago

I cannot reproduce the crash. To me it just outputs "All done" and exits normally. I use node v8.9.4 on Fedora 27. Which version of node do you use with what OS, and are you sure you use the latest node-lmdb?

I just noticed you use the Uint8Array as a key. This has never been tested. Especially since in your configuration, it expects a string key. If you want to use a binary key, you should specify keyIsBuffer: true to env.openDbi and use a node Buffer. The fact that it doesn't throw an error for your input is a bug in itself.

So, I investigated a little. Looks like node thinks that Uint8Array is a buffer, actually node::Buffer::HasInstance returns true if given the Uint8Array and then node::Buffer::Data + node::Buffer::Length work with it correctly too.

Looking further into the code, it is not necessary anymore to specify the key type to openDbi in this case it will infer the key type for you.

Venemo commented 6 years ago

Sorry for the noise. So far I have not been able to reproduce the problem. Hopefully knowing the OS and node version will help.

Venemo commented 6 years ago

It would also help to see the stack trace from a debug build, ie. node-gyp configure --debug.

mar-v-in commented 6 years ago

I was able to do some tests with the native code. The problem seems to occur only when compiling with gcc and -O3 flag (-O2 or clang are fine). gcc and -O3 seem to be the default for release build on Linux systems and setting -O2 in binding.gyp will just be ignored. Debug builds on Linux default to use -O0 (even if -O3 is set in binding.gyp), so just enabling debug will cause the bug to no longer be visible.

I manually did a debug build with -O3 and -g, this is the resulting stack trace:

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
mdb_cursor_put (mc=<optimized out>, key=0x7fffffffc4e0, data=0x7fffffffc4f0, flags=<optimized out>) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:7652
7652                            mp->mp_ptrs[i] = fp->mp_ptrs[i] + offset;
(gdb) bt
#0  mdb_cursor_put (mc=<optimized out>, key=0x7fffffffc4e0, data=0x7fffffffc4f0, flags=<optimized out>) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:7652
#1  0x00007ffff48b88d6 in mdb_put (txn=0x228c040, dbi=2, key=key@entry=0x7fffffffc4e0, data=data@entry=0x7fffffffc4f0, flags=0) at ../dependencies/lmdb/libraries/liblmdb/mdb.c:9847
#2  0x00007ffff48c0296 in TxnWrap::putCommon (info=..., fillFunc=0x7ffff48bf190 <TxnWrap::<lambda(Nan::NAN_METHOD_ARGS_TYPE, MDB_val&)>::_FUN(Nan::NAN_METHOD_ARGS_TYPE, MDB_val &)>, 
    freeData=0x7ffff48befc0 <TxnWrap::<lambda(MDB_val&)>::_FUN(MDB_val &)>) at ../src/txn.cpp:278
#3  0x00007ffff48b9afc in Nan::imp::FunctionCallbackWrapper (info=...) at ../../nan/nan_callbacks_12_inl.h:174
#4  0x0000000000a9df83 in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) ()
#5  0x0000000000b14fac in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#6  0x0000000000b15bff in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()

I guess this could be a bug in GCC or lmdb itself and not the bindings. I use GCC 7.3.1 as provided in Fedora 27.

mar-v-in commented 6 years ago

Using a modified binding.gyp to remove the default -O3 and use -O2 instead makes everything run smooth again. The issue also does not occur with a legacy GCC 3.4.6.

          "cflags": [
            "-fPIC",
            "-O2"
          ],
          "cflags!": [
            "-O3"
          ],

Venemo commented 6 years ago

Thank you @mar-v-in ― yes, I can reproduce the problem with -O3. It's a bit scary as it comes from lmdb, from a part where basically everything is optimized out. When I touch the code, it is no longer optimized out, and the problem is no longer present.

Would be interesting to see if the same issue can be reproduced in C. If yes, that means this is an lmdb bug. Otherwise something with node-lmdb.

While the investigation is ongoing, I agree to drop the -O3 flag.

Venemo commented 6 years ago

@mar-v-in Can you confirm that this C snippet reproduces the same crash, if using an -O3 compiled lmdb: https://pastebin.com/DeQ38WfC

Venemo commented 6 years ago

I asked around in #openldap and they pointed me to an ITS which was fixed just a few days ago. So I just updated the LMDB dependency and the problem is gone.

Venemo commented 6 years ago

@paberr @mar-v-in Can you guys confirm that the issue has been fixed?

paberr commented 6 years ago

Yes, it seems to be fixed now.

Venemo commented 6 years ago

Awesome, thank you. I'm closing this issue now. :)

Venemo / node-lmdb

SIGSEGV in mdb_put #120