Venemo / node-lmdb

Node.js binding for lmdb
MIT License
361 stars 72 forks source link

Cursor is leaking memory #31

Closed joernroeder closed 8 years ago

joernroeder commented 10 years ago

Hey, I've discovered a memory leak in the lmdb.Cursor and updated the cursor example to reproduce the leak outside of my application. After that I've used Instruments Memory-Leak Template and attached it to the running node process to digg a bit deeper and saw the obvious and continuously growing amount of CustomExternalStringResource … I hope the following screenshots can help to fix this leak and make this lmdb module ready for long term and heavy usage execution.

cheers

screen shot 2014-08-18 at 4 36 13 am

screen shot 2014-08-18 at 4 37 03 am

Venemo commented 10 years ago

Hi,

Thanks for reporting this! It is an interesting issue, because it's not as simple as it seems.

What does this mean?

The issue might be simply with the V8 garbage collector not kicking in, but I'll look into it deeper to see what I can do. It may still very well be my fault.

Venemo commented 10 years ago

After a quick look at your code, it appears that txn.put is much more likely to leak than the cursor operations you do. When you don't use cursors, just transactions, does the leak still happen?

joernroeder commented 10 years ago

I will create another test case without the cursor. But during my digging through the source and playing around I've used txn.putString to fill up the database to it's limit. After that I've used the cursor to interate over the database and my allocations started to grow. Therefore (and because of the stack in the second screenshot -> CursorWrap::getCommon) I currently think the cursor is leaking …

The following screenshot shows two iterations of the following test case:

  1. add items to the db
  2. iterate over all items (using the cursor)
  3. close db
  4. remove data.mdb and restart

screen shot 2014-08-18 at 1 45 46 am

The CustomExternalStringResource allocations are marked in red…

Venemo commented 10 years ago

I see. Okay, I accept, the cursor might be doing something bad. My guess (but this is just a guess) is that there's something in there that prevents V8's GC from collecting the string resources (and/or the cursor instance itself).

russtee commented 8 years ago

Hello there,

I'm getting this too. Was happening on node v0.10.29 with an old version of the lib and so we've ported everything to v5.6.0 with the latest lib and it's still there.

On a 4G VM with a tight loop and db of around 30M keys I can exhaust the machines memory in around 30s.

Just iterating the cursor is enough.

Here's a code snippet:

var txn = e.beginTxn({readOnly: true});
var cursor = new nodelmdb.Cursor(txn,db);
for (var c = cursor.goToFirst(); c; c=cursor.goToNext()) {
    if (0 === done++ % LOG_EVERY) {
        log.info("LOOP: done="+done+", c="+c);
    }
    continue;

Any help would be much appreciated as we've just moved everything to node-lmdb because of the support for synchronous get/put/del as well as sub dirs and now we're stuck.

Thanks in advance Russ

russtee commented 8 years ago

Some experimental results.

I had an idea that the tight sync loop could be stopping the node garbage collector from running.

To test this theory I implemented the above using the streams API and piped the result into /dev/null. Unfortunately the same memory leak occurs.

I then added a forced call to the node garbage collector every 100,000 items and that made no difference either.

Venemo commented 8 years ago

@russtee Thanks for chiming in to this discussion. Can you give me a minimal test case that will reproduce the memory leak for me?

Venemo commented 8 years ago

(As an aside, this might have to do with #21 but could be something else entirely.)

russtee commented 8 years ago

this kills my machine in about 30s

var nodelmdb = require('node-lmdb');

var e = new nodelmdb.Env();
e.open({
    path: './test'
});

var db = e.openDbi({
    name: 'db1',
    create: true
});

var txn = e.beginTxn();
txn.putString(db,'akey','whatever');
txn.commit();

var txn = e.beginTxn();
var cursor = new nodelmdb.Cursor(txn,db);

for (var i=0; true; i++) {
    cursor.goToFirst();
    cursor.goToNext();
    if (0 === i % 1000000) {
        console.log("done "+i);
    }
}

// won't get this far
cursor.close();
txn.commit();
db.close();
e.close();
Venemo commented 8 years ago

@russtee Yup, you got me. Machine is swapping like hell! So basically the leak doesn't have anything to do with creating and destroying cursors, simply traversing back and forth using a single cursor can eat all your memory...?

Venemo commented 8 years ago

I've found one possible memory leak: on each call to any of the goToSomething-functions, the cursor wrapper will create a V8 handle for the current key and data pointed to by the cursor (by calling keyToHandle which in turn calls valToString). Unfortunately this means that it mindlessly creates a bunch of CustomExternalStringResource instances, and V8 seems to never release those.

russtee commented 8 years ago

I think that's it, when I run it with valgrind

valgrind --tool=memcheck --leak-check=full node bang.js

I get the following which seems pretty conclusive:

==42528== 77,568,624 bytes in 3,232,026 blocks are definitely lost in loss record 513 of 513
==42528==    at 0x1016C1EBB: malloc (in /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==42528==    by 0x1026DC33D: operator new(unsigned long) (in /usr/lib/libc++abi.dylib)
==42528==    by 0x1045C2831: valToString(MDB_val&) (misc.cpp:91)
==42528==    by 0x1045C50FC: CursorWrap::getCommon(Nan::FunctionCallbackInfo<v8::Value> const&, MDB_cursor_op, void (*)(CursorWrap*, Nan::FunctionCallbackInfo<v8::Value> const&, MDB_val&), void (*)(CursorWrap*, Nan::FunctionCallbackInfo<v8::Value> const&, MDB_val&), void (*)(CursorWrap*, Nan::FunctionCallbackInfo<v8::Value> const&, MDB_val&), v8::Local<v8::Value> (*)(MDB_val&)) (cursor.cpp:129)
==42528==    by 0x1045C1DFC: Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) (nan_callbacks_12_inl.h:174)
==42528==    by 0x20A473A6F207: ???
==42528==    by 0x20A473A7026E: ???
==42528==    by 0x20A473934A9B: ???
==42528==    by 0x20A473A4DD1B: ???
==42528==    by 0x20A473A4658A: ???
==42528==    by 0x20A473A433BA: ???
==42528==    by 0x20A473A38C8D: ???
==42528== 
==42528== LEAK SUMMARY:
==42528==    definitely lost: 77,866,116 bytes in 3,244,334 blocks
==42528==    indirectly lost: 5,292 bytes in 22 blocks
==42528==      possibly lost: 1,057,068 bytes in 160 blocks
==42528==    still reachable: 3,428,852 bytes in 22,942 blocks
==42528==                       of which reachable via heuristic:
==42528==                         newarray           : 1,640 bytes in 1 blocks
==42528==         suppressed: 59,363 bytes in 186 blocks
==42528== Reachable blocks (those to which a pointer was found) are not shown.
==42528== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==42528== 
==42528== For counts of detected and suppressed errors, rerun with: -v
==42528== Use --track-origins=yes to see where uninitialised values come from
==42528== ERROR SUMMARY: 178 errors from 178 contexts (suppressed: 24 from 24)
russtee commented 8 years ago

this seems to fix it:

Handle<Value> valToString(MDB_val &data) {
    return String::NewFromTwoByte(Isolate::GetCurrent(), (uint16_t*)(data.mv_data), v8::String::kNormalString, (data.mv_size / sizeof(uint16_t) - 1));
}

no longer leaks memory. Need to test it with our wider systems.

Venemo commented 8 years ago

@russtee Congrats, you just defeated the 'zero-copy' promise of node-lmdb and introduced an extra copy to every string. :)

Venemo commented 8 years ago

@russtee The solution is very banal though: adding delete this; to the Dispose method of CustomExternalStringResource. Looks like V8 calls Dispose and expects that to get rid of the instance as well, so never calls the destructor.

Venemo commented 8 years ago

I committed and pushed a change which makes the memory leak go away successfully for me. Can you please give it a try and get back to me about whether or not it worked?

russtee commented 8 years ago

I don't think the 'zero-copy' promise is a real constraining factor in a typical node.js app. v8 copies things very wastefully and IO tends to be the major bottleneck.

Great news on the fix, will give it a try. Hopefully we can finally complete the switch to node-lmdb.

Venemo commented 8 years ago

@russtee Actually, one of the reasons why lmdb is fast is the zero-copy promise. Not copying something is always slower than copying it. Why this is a good thing is (since the lmdb database is a memory-mapped file) because you let the OS decide which parts need to reside in memory and which can be paged on disk, instead of insisting on putting everything into the memory (which if it runs out can end up in swap anyway).

russtee commented 8 years ago

this seems fixed for both the specific test and our wider application. thanks.

Venemo commented 8 years ago

@russtee Cool, I'll close this issue then. :)