RobotWebTools / rclnodejs

Node.js version of ROS 2.0 client
https://docs.ros.org/en/humble/Concepts/Basic/About-Client-Libraries.html?highlight=rclnodejs#community-maintained
Apache License 2.0
319 stars 70 forks source link

Unit tests fail on Node.js v14.x and v15.x - Check failed: result.second. #717

Closed minggangw closed 2 years ago

minggangw commented 3 years ago

When using the latest Node.js LTS Fermium v14.15.0, the unit test began to fail consistently and it also reproduces on v15.x versions. Backtrace is

#
# Fatal error in , line 0
# Check failed: result.second.
#
#
#
#FailureMessage Object: 0x7fff04fd3a70
 1: 0xa70141  [node]
 2: 0x19cf084 V8_Fatal(char const*, ...) [node]
 3: 0xe594c9 v8::internal::GlobalBackingStoreRegistry::Register(std::shared_ptr<v8::internal::BackingStore>) [node]
 4: 0xba4a18 v8::ArrayBuffer::GetBackingStore() [node]
 5: 0x9c18f0 napi_get_typedarray_info [node]
 6: 0x7fa130ce70ef  [/root/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node]
 7: 0x7fa130ce7918  [/root/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node]
 8: 0x7fa130ce7bbb  [/root/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node]
 9: 0x7fa130cefceb Napi::details::CallbackData<void (*)(Napi::CallbackInfo const&), void>::Wrapper(napi_env__*, napi_callback_info__*) [/root/rclnodejs/node_modules/ref-napi/prebuilds/linux-x64/node.napi.node]

and the corresponding PR #716

minggangw commented 3 years ago

The failed CHECK() is https://github.com/nodejs/node/blob/c55f661551d7368778ae6f9adc584467151bce8a/deps/v8/src/objects/backing-store.cc#L695

minggangw commented 3 years ago

I did some investigation locally, this crash happens when we use TypedArray when receiving topics and calling CreateArrayBufferFromAddress to allocate memory, backtrace below:

#
# Fatal error in , line 0
# Check failed: length == backing_store->byte_length().
#
#
#
#FailureMessage Object: 0x7fffffff8c00
 1: 0xa70141  [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 2: 0x19cf084 V8_Fatal(char const*, ...) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 3: 0xe599ea v8::internal::GlobalBackingStoreRegistry::Lookup(void*, unsigned long) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 4: 0xba44af  [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 5: 0xba46b4 v8::ArrayBuffer::New(v8::Isolate*, void*, unsigned long, v8::ArrayBufferCreationMode) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 6: 0x7ffff52134af rclnodejs::CreateArrayBufferFromAddress(Nan::FunctionCallbackInfo<v8::Value> const&) [/home/minggang/proj/ros2/rclnodejs-1/build/Release/rclnodejs.node]
 7: 0x7ffff52049ac  [/home/minggang/proj/ros2/rclnodejs-1/build/Release/rclnodejs.node]
 8: 0xbe369b  [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
 9: 0xbe4c46  [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
10: 0xbe52c6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
11: 0x13ff259  [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]

Thread 1 "node" received signal SIGILL, Illegal instruction.
minggangw commented 3 years ago

I think I may find the root cause of this issue.

Since v8 8.4.371(node14.15.0), there is a new static constructor used to create a v8::ArrayBuffer object (https://v8docs.nodesource.com/node-14.15/d5/d6e/classv8_1_1_array_buffer.html#ab6957a65e835a906e166aadd83e604c2), and per v8 document, we have to create a v8::BackingStore in order to pass a pointer of existing memory block. Although, the compiling is successful by using the deprecated ctor, there will be an internal state error

The failed CHECK() is https://github.com/nodejs/node/blob/c55f661551d7368778ae6f9adc584467151bce8a/deps/v8/src/objects/backing-store.cc#L695

minggangw commented 3 years ago

I found some relevant issues:

minggangw commented 3 years ago

Actually, this issue is ever reported #694 in Aug

minggangw commented 3 years ago

New discussions from https://github.com/node-ffi-napi/ref-napi/issues/54

wayneparrott commented 3 years ago

rclnodejs on window 10, node 16 experiences the following error when sending array data. Does this look familiar? Anything I can help with in investigating this in more depth?

#FailureMessage Object: 00000036200FE3C0
 1: 00007FF60FFDB05F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+111311
 2: 00007FF60FEFB2DF v8::CFunction::ReturnInfo+2111
 3: 00007FF610C0C452 V8_Fatal+162
 4: 00007FF61065495F v8::internal::BackingStore::GrowWasmMemoryInPlace+639
 5: 00007FF6108DD11C v8::Isolate::LocaleConfigurationChangeNotification+268
 6: 00007FF6108DE1D8 v8::ArrayBuffer::New+232
 7: 00007FFCC4448E39 rclnodejs::CreateArrayBufferFromAddress+217 [c:\dev\workspaces\driveai\rclnodejs\src\rcl_bindings.cpp]:L1324
 8: 00007FFCC4431E47 Nan::imp::FunctionCallbackWrapper+231 [c:\dev\workspaces\driveai\rclnodejs\node_modules\nan\nan_callbacks_12_inl.h]:L177
 9: 00007FF610897AA9 v8::internal::Builtins::builtin_handle+318217
10: 00007FF610897041 v8::internal::Builtins::builtin_handle+315553
11: 00007FF610897331 v8::internal::Builtins::builtin_handle+316305
12: 00007FF610897173 v8::internal::Builtins::builtin_handle+315859
13: 00007FF61096FD11 v8::internal::SetupIsolateDelegate::SetupHeap+463377
14: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
15: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
16: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
17: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
18: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
19: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
20: 00007FF610981DA6 v8::internal::SetupIsolateDelegate::SetupHeap+537254
21: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
22: 00007FF61090608F v8::internal::SetupIsolateDelegate::SetupHeap+30095
23: 00007FF610905C8B v8::internal::SetupIsolateDelegate::SetupHeap+29067
24: 00007FF6107BE0E2 v8::internal::Execution::CallWasm+1698
25: 00007FF6107BD91F v8::internal::Execution::Call+191
26: 00007FF6108C5091 v8::Function::Call+609
27: 00007FF61000A881 node::CallbackScope::~CallbackScope+1729
28: 00007FF61000AD11 node::MakeCallback+241
29: 00007FF61000AEB5 node::MakeCallback+325
30: 00007FF61000ABFB node::MakeCallback+139
31: 00007FFCC4453BCD rclnodejs::ShadowNode::Execute+893 [c:\dev\workspaces\driveai\rclnodejs\src\shadow_node.cpp]:L123
32: 00007FFCC4434330 rclnodejs::Executor::ExecuteReadyHandles+288 [c:\dev\workspaces\driveai\rclnodejs\src\executor.cpp]:L258
33: 00007FF61003B02B uv_async_send+331
34: 00007FF61003A7BC uv_loop_init+1292
35: 00007FF61003A95A uv_run+202
36: 00007FF610009C24 node::SpinEventLoop+308
37: 00007FF60FF25973 v8::internal::AsmJsScanner::GetIdentifierString+53635
38: 00007FF60FFA09D7 node::Start+215
39: 00007FF60FDC84FC RC4_options+347132
40: 00007FF610E9067C v8::internal::compiler::RepresentationChanger::Uint32OverflowOperatorFor+151692
41: 00007FFCEA8554E0 BaseThreadInitThunk+16
42: 00007FFCEBB8485B RtlUserThreadStart+43
minggangw commented 3 years ago

Looking through the back trace, I guess this has the same root cause with what this issue described originally. The ref-napi leads to this crash, which is clear, but the owner of iref-napi worries about the backwards-compatibility. So I think we have two options here:

Personally, I prefer the second one because it remove dependency completely, the ref-napi is still a potential risk when v8 upgrades.

wayneparrott commented 3 years ago

Out of curiosity I replaced ref-napi with this recently patched version https://github.com/alphacep/ref-napi. On windows it fails with this stack-trace when deserializing an array or complex msg type:

C:\dev\workspaces\driveai\rclnodejs\example>node publisher-message-example.js
Publish 1 messages.
Publish 2 messages.
Publish 3 messages.
Publish 4 messages.
Publish 5 messages.
Publish 6 messages.
C:\dev\workspaces\driveai\rclnodejs\lib\publisher.js:59
      rclnodejs.publish(this._handle, rawMessage);
                ^

Error: publisher pointer is invalid, at C:\ci\ws\src\ros2\rcl\rcl\src\rcl\publisher.c:434
    at Publisher.publish (C:\dev\workspaces\driveai\rclnodejs\lib\publisher.js:59:17)
    at Timeout._onTimeout (C:\dev\workspaces\driveai\rclnodejs\example\publisher-message-example.js:32:17)
    at listOnTimeout (node:internal/timers:557:17)
    at processTimers (node:internal/timers:500:7)

C:\dev\workspaces\driveai\rclnodejs\example>node subscription-message-example.js
C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-napi\lib\ref.js:786
  const rtn = exports._reinterpret(buffer, size, offset || 0);
                      ^

Error: reinterpret: Cannot reinterpret from nullptr pointer
    at Object.reinterpret (C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-napi\lib\ref.js:786:23)
    at ArrayType.getter (C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-array-di\lib\array.js:256:19)
    at ArrayType.get (C:\dev\workspaces\driveai\rclnodejs\node_modules\array-index\index.js:171:32)
    at JointStateWrapper.deserialize (C:\dev\workspaces\driveai\rclnodejs\generated\sensor_msgs\sensor_msgs__msg__JointState.js:126:51)
    at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:1614:14
    at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:185:42
    at Node._runWithMessageType (C:\dev\workspaces\driveai\rclnodejs\lib\node.js:1612:5)
    at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:180:12
    at Array.forEach (<anonymous>)
    at Node.execute (C:\dev\workspaces\driveai\rclnodejs\lib\node.js:172:24)
[foonathan::memory] Allocator foonathan::memory::new_allocator (at 0000000000000000) leaked 1856 bytes.
[foonathan::memory] Allocator foonathan::memory::heap_allocator (at 0000000000000000) leaked 17816 bytes.
minggangw commented 3 years ago

Thanks for trying the patches out, it seems that we have to make some changes accordingly. We may consider applying #723 (the situation may have changed since it was implemented in last year). BTW, what is your nodejs versoin?

wayneparrott commented 3 years ago

opps accidentally omitted that the stack trace above was generated on ros2 foxy, window 10, vsc 2017, node 16.3.0.

koonpeng commented 3 years ago

I remember @koonpeng ever submitted a PR #774 to get rid of the dependency of ref-napi

I have reservations about depending on that PR to fix this issue, it is very experimental and it breaks several apis of rclnodejs. I also don't think I have the time to work on it in the foreseeable future.

minggangw commented 3 years ago

Hi @koonpeng thanks for updating the status and it seems that we'd better try to fix the bug of ref-napi instead.

minggangw commented 2 years ago

After forking ref-napi, ref-struct-di and ref-array-di and making some changes, we can pass all the unit tests on latest galactic release now :smile: , if you want to experience rclnodejs with nodejs >= 14 please check out branch https://github.com/RobotWebTools/rclnodejs/tree/nodejs-16

minggangw commented 2 years ago

I think the issue was fixed, so close it.