RobotWebTools / rclnodejs

Node.js version of ROS 2.0 client
https://docs.ros.org/en/jazzy/Concepts/Basic/About-Client-Libraries.html#community-maintained
Apache License 2.0
325 stars 71 forks source link

memory leaks in RclHandle and ref-napi #674

Closed koonpeng closed 4 years ago

koonpeng commented 4 years ago

Running in valgrind reports memory leaks in

==17222== 176 bytes in 1 blocks are definitely lost in loss record 69 of 102
==17222==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17222==    by 0xA3BFC5C: rclnodejs::RclHandle::New(Nan::FunctionCallbackInfo<v8::Value> const&) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)
==17222==    by 0xA3BF72E: Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)
==17222==    by 0xBEF34A: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xBF023C: v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::HeapObject>) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xCC4005: v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, v8::internal::(anonymous namespace)::InvokeParams const&) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xCC4609: v8::internal::Execution::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xB939AF: v8::Function::NewInstanceWithSideEffectType(v8::Local<v8::Context>, int, v8::Local<v8::Value>*, v8::SideEffectType) const (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xB93C6B: v8::Function::NewInstance(v8::Local<v8::Context>, int, v8::Local<v8::Value>*) const (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xA3C0794: rclnodejs::RclHandle::NewInstance(void*, rclnodejs::RclHandle*, std::function<int ()>) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)
==17222==    by 0xA3BBBE2: rclnodejs::CreateNode(Nan::FunctionCallbackInfo<v8::Value> const&) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)
==17222==    by 0xA3A579E: Nan::imp::FunctionCallbackWrapper(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)

and

==17222== 80 bytes in 1 blocks are definitely lost in loss record 50 of 102
==17222==    at 0x4C3017F: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==17222==    by 0x9CCCA1: napi_create_reference (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0x136FF2D9: (anonymous namespace)::GetBufferData(Napi::Value) (in /home/teokp/source/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node)
==17222==    by 0x1370168A: (anonymous namespace)::WritePointer(Napi::CallbackInfo const&) (in /home/teokp/source/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node)
==17222==    by 0x13707EBA: Napi::details::CallbackData<void (*)(Napi::CallbackInfo const&), void>::Wrapper(napi_env__*, napi_callback_info__*) (in /home/teokp/source/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node)
==17222==    by 0x9C5874: v8impl::(anonymous namespace)::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xBEE088: v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0xBEFE76: v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0x13CCF78: ??? (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0x1352823: ??? (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0x1352823: ??? (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)
==17222==    by 0x1352823: ??? (in /home/teokp/.nvm/versions/node/v12.18.2/bin/node)

For the ref-napi leaks, it looks like when a new buffer is created at

  _runWithMessageType(Type, callback) {
    let message = new Type();

    callback(message.toRawROS(), () => {
      let result = new Type();
      result.deserialize(message.refObject);

      return result;
    });

    Type.destoryRawROS(message);
  }

this creates a new "StructType" of ref-struct-di, which creates some buffers using ref-napi, the underlying ref to the buffer is obtained through message.toRawROS, this buffer is then passed to the native function RclTake which calls rcl_take, writing the ros message into the buffer. This buffer still contains the "default allocated" data made by ref-napi, and so we lose the pointers previously there. There could also be danger of double free as ref-napi does it's own management of the pointers but we are also managing it in rclnodejs.

minggangw commented 4 years ago

Sorry for the delay, and thanks for investigating the memory leak issue which I think it's very complicated for every C++ project. Some thoughts:

==17222== by 0xA3BBBE2: rclnodejs::CreateNode(Nan::FunctionCallbackInfo const&) (in /home/teokp/source/rclnodejs/build/Release/rclnodejs.node)

I noticed that the leak happens when creating a node, I'm wondering does it also happen when creating other entities, e.g. a publisher or client? Theoretically, the instance of RclHandle should be managed by v8 and will not get leaked. @koonpeng would you please introduce some details about how you were running valgrind, like the case you ran and the command with what params?

This buffer still contains the "default allocated" data made by ref-napi, and so we lose the pointers previously there.

For my understanding, we don't need to take care of the memory allocated by ref-napi (correct me if I'm wrong), and as you said we pass the pointer allocated by ref-napi which will be written data through calling rcl_take. Do you have a concrete example to show when the leak happens or its pre-condition?

Thank you for finding these critical problems in rclnodejs and digging into it, the information you offered is very helpful. I hope we could fix this ASAP (also the #673 which is also a memory usage issue).

koonpeng commented 4 years ago

This is the command i used to run valgrind, I'm using the tests I created for the use-after-free bug.

valgrind --leak-check=full node --expose-gc node_modules/.bin/mocha test/test-promise-gc.js -f 'std_msgs/msg/UInt8MultiArray'

I just ran valgrind on a minimal test script

require('ref-napi');

and it reports the same leak as well, maybe it does not involve rclnodejs, I will try to look more into it.

koonpeng commented 4 years ago

I think I identified the leak from createNode, ShadowNode has accessors for handle and for some reason, assign the object to the persistent handle causes the weak callback installed by Nan::ObjectWrap to not get called. I simply removed the accessors and it seems fix it, since the RclHandle lifetime is handled by v8 we shouldnt need to destroy it ourselves.

I'm still looking at other leaks and invalid access, eventually I would probably run the test suite with asan to make sure things are working.

minggangw commented 4 years ago

I did some debugging and also found the dtor of ShadowNode not invoked, but I am pretty sure that it once worked. Not sure when the regression was introduced.