Closed minggangw closed 2 years ago
I did some investigation locally, this crash happens when we use TypedArray when receiving topics and calling CreateArrayBufferFromAddress
to allocate memory, backtrace below:
#
# Fatal error in , line 0
# Check failed: length == backing_store->byte_length().
#
#
#
#FailureMessage Object: 0x7fffffff8c00
1: 0xa70141 [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
2: 0x19cf084 V8_Fatal(char const*, ...) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
3: 0xe599ea v8::internal::GlobalBackingStoreRegistry::Lookup(void*, unsigned long) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
4: 0xba44af [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
5: 0xba46b4 v8::ArrayBuffer::New(v8::Isolate*, void*, unsigned long, v8::ArrayBufferCreationMode) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
6: 0x7ffff52134af rclnodejs::CreateArrayBufferFromAddress(Nan::FunctionCallbackInfo<v8::Value> const&) [/home/minggang/proj/ros2/rclnodejs-1/build/Release/rclnodejs.node]
7: 0x7ffff52049ac [/home/minggang/proj/ros2/rclnodejs-1/build/Release/rclnodejs.node]
8: 0xbe369b [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
9: 0xbe4c46 [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
10: 0xbe52c6 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
11: 0x13ff259 [/home/minggang/.nvm/versions/node/v14.15.0/bin/node]
Thread 1 "node" received signal SIGILL, Illegal instruction.
I think I may find the root cause of this issue.
Since v8 8.4.371(node14.15.0), there is a new static constructor used to create a v8::ArrayBuffer
object (https://v8docs.nodesource.com/node-14.15/d5/d6e/classv8_1_1_array_buffer.html#ab6957a65e835a906e166aadd83e604c2), and per v8 document, we have to create a v8::BackingStore
in order to pass a pointer of existing memory block. Although, the compiling is successful by using the deprecated ctor, there will be an internal state error
The failed
CHECK()
is https://github.com/nodejs/node/blob/c55f661551d7368778ae6f9adc584467151bce8a/deps/v8/src/objects/backing-store.cc#L695
I found some relevant issues:
Actually, this issue is ever reported #694 in Aug
New discussions from https://github.com/node-ffi-napi/ref-napi/issues/54
rclnodejs on window 10, node 16 experiences the following error when sending array data. Does this look familiar? Anything I can help with in investigating this in more depth?
#FailureMessage Object: 00000036200FE3C0
1: 00007FF60FFDB05F v8::internal::CodeObjectRegistry::~CodeObjectRegistry+111311
2: 00007FF60FEFB2DF v8::CFunction::ReturnInfo+2111
3: 00007FF610C0C452 V8_Fatal+162
4: 00007FF61065495F v8::internal::BackingStore::GrowWasmMemoryInPlace+639
5: 00007FF6108DD11C v8::Isolate::LocaleConfigurationChangeNotification+268
6: 00007FF6108DE1D8 v8::ArrayBuffer::New+232
7: 00007FFCC4448E39 rclnodejs::CreateArrayBufferFromAddress+217 [c:\dev\workspaces\driveai\rclnodejs\src\rcl_bindings.cpp]:L1324
8: 00007FFCC4431E47 Nan::imp::FunctionCallbackWrapper+231 [c:\dev\workspaces\driveai\rclnodejs\node_modules\nan\nan_callbacks_12_inl.h]:L177
9: 00007FF610897AA9 v8::internal::Builtins::builtin_handle+318217
10: 00007FF610897041 v8::internal::Builtins::builtin_handle+315553
11: 00007FF610897331 v8::internal::Builtins::builtin_handle+316305
12: 00007FF610897173 v8::internal::Builtins::builtin_handle+315859
13: 00007FF61096FD11 v8::internal::SetupIsolateDelegate::SetupHeap+463377
14: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
15: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
16: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
17: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
18: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
19: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
20: 00007FF610981DA6 v8::internal::SetupIsolateDelegate::SetupHeap+537254
21: 00007FF610907E39 v8::internal::SetupIsolateDelegate::SetupHeap+37689
22: 00007FF61090608F v8::internal::SetupIsolateDelegate::SetupHeap+30095
23: 00007FF610905C8B v8::internal::SetupIsolateDelegate::SetupHeap+29067
24: 00007FF6107BE0E2 v8::internal::Execution::CallWasm+1698
25: 00007FF6107BD91F v8::internal::Execution::Call+191
26: 00007FF6108C5091 v8::Function::Call+609
27: 00007FF61000A881 node::CallbackScope::~CallbackScope+1729
28: 00007FF61000AD11 node::MakeCallback+241
29: 00007FF61000AEB5 node::MakeCallback+325
30: 00007FF61000ABFB node::MakeCallback+139
31: 00007FFCC4453BCD rclnodejs::ShadowNode::Execute+893 [c:\dev\workspaces\driveai\rclnodejs\src\shadow_node.cpp]:L123
32: 00007FFCC4434330 rclnodejs::Executor::ExecuteReadyHandles+288 [c:\dev\workspaces\driveai\rclnodejs\src\executor.cpp]:L258
33: 00007FF61003B02B uv_async_send+331
34: 00007FF61003A7BC uv_loop_init+1292
35: 00007FF61003A95A uv_run+202
36: 00007FF610009C24 node::SpinEventLoop+308
37: 00007FF60FF25973 v8::internal::AsmJsScanner::GetIdentifierString+53635
38: 00007FF60FFA09D7 node::Start+215
39: 00007FF60FDC84FC RC4_options+347132
40: 00007FF610E9067C v8::internal::compiler::RepresentationChanger::Uint32OverflowOperatorFor+151692
41: 00007FFCEA8554E0 BaseThreadInitThunk+16
42: 00007FFCEBB8485B RtlUserThreadStart+43
Looking through the back trace, I guess this has the same root cause with what this issue described originally. The ref-napi
leads to this crash, which is clear, but the owner of iref-napi
worries about the backwards-compatibility. So I think we have two options here:
ref-napi
, e.g. offered by hereref-napi
Personally, I prefer the second one because it remove dependency completely, the ref-napi
is still a potential risk when v8 upgrades.
Out of curiosity I replaced ref-napi with this recently patched version https://github.com/alphacep/ref-napi. On windows it fails with this stack-trace when deserializing an array or complex msg type:
C:\dev\workspaces\driveai\rclnodejs\example>node publisher-message-example.js
Publish 1 messages.
Publish 2 messages.
Publish 3 messages.
Publish 4 messages.
Publish 5 messages.
Publish 6 messages.
C:\dev\workspaces\driveai\rclnodejs\lib\publisher.js:59
rclnodejs.publish(this._handle, rawMessage);
^
Error: publisher pointer is invalid, at C:\ci\ws\src\ros2\rcl\rcl\src\rcl\publisher.c:434
at Publisher.publish (C:\dev\workspaces\driveai\rclnodejs\lib\publisher.js:59:17)
at Timeout._onTimeout (C:\dev\workspaces\driveai\rclnodejs\example\publisher-message-example.js:32:17)
at listOnTimeout (node:internal/timers:557:17)
at processTimers (node:internal/timers:500:7)
C:\dev\workspaces\driveai\rclnodejs\example>node subscription-message-example.js
C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-napi\lib\ref.js:786
const rtn = exports._reinterpret(buffer, size, offset || 0);
^
Error: reinterpret: Cannot reinterpret from nullptr pointer
at Object.reinterpret (C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-napi\lib\ref.js:786:23)
at ArrayType.getter (C:\dev\workspaces\driveai\rclnodejs\node_modules\ref-array-di\lib\array.js:256:19)
at ArrayType.get (C:\dev\workspaces\driveai\rclnodejs\node_modules\array-index\index.js:171:32)
at JointStateWrapper.deserialize (C:\dev\workspaces\driveai\rclnodejs\generated\sensor_msgs\sensor_msgs__msg__JointState.js:126:51)
at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:1614:14
at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:185:42
at Node._runWithMessageType (C:\dev\workspaces\driveai\rclnodejs\lib\node.js:1612:5)
at C:\dev\workspaces\driveai\rclnodejs\lib\node.js:180:12
at Array.forEach (<anonymous>)
at Node.execute (C:\dev\workspaces\driveai\rclnodejs\lib\node.js:172:24)
[foonathan::memory] Allocator foonathan::memory::new_allocator (at 0000000000000000) leaked 1856 bytes.
[foonathan::memory] Allocator foonathan::memory::heap_allocator (at 0000000000000000) leaked 17816 bytes.
Thanks for trying the patches out, it seems that we have to make some changes accordingly. We may consider applying #723 (the situation may have changed since it was implemented in last year). BTW, what is your nodejs versoin?
opps accidentally omitted that the stack trace above was generated on ros2 foxy, window 10, vsc 2017, node 16.3.0.
I remember @koonpeng ever submitted a PR #774 to get rid of the dependency of ref-napi
I have reservations about depending on that PR to fix this issue, it is very experimental and it breaks several apis of rclnodejs. I also don't think I have the time to work on it in the foreseeable future.
Hi @koonpeng thanks for updating the status and it seems that we'd better try to fix the bug of ref-napi
instead.
After forking ref-napi
, ref-struct-di
and ref-array-di
and making some changes, we can pass all the unit tests on latest galactic release now :smile: , if you want to experience rclnodejs
with nodejs >= 14 please check out branch https://github.com/RobotWebTools/rclnodejs/tree/nodejs-16
I think the issue was fixed, so close it.
When using the latest Node.js LTS Fermium v14.15.0, the unit test began to fail consistently and it also reproduces on v15.x versions. Backtrace is
and the corresponding PR #716