confluentinc / confluent-kafka-javascript

Confluent's Apache Kafka JavaScript client
https://www.npmjs.com/package/@confluentinc/kafka-javascript
MIT License
83 stars 8 forks source link

Segfault in e2e/groups.spec.js #164

Open trevorr opened 1 week ago

trevorr commented 1 week ago

Environment Information

Steps to Reproduce

$ npx mocha --timeout 120000 e2e/groups.spec.js

  Consumer group/Producer
[1]    13601 segmentation fault  npx mocha --timeout 120000 e2e/

Here's the stack trace:

llnode -- node ./node_modules/.bin/mocha  --timeout 120000 e2e/groups.spec.js
(lldb) target create "node"
Current executable set to '/opt/node-debug/bin/node' (arm64).
(lldb) settings set -- target.run-args  "./node_modules/.bin/mocha" "--timeout" "120000" "e2e/groups.spec.js"
(lldb) plugin load '/Users/trevor/.nvm/versions/node/v20.18.0/lib/node_modules/llnode/llnode.dylib'
(lldb) settings set prompt '(llnode) '
(llnode) r
Process 11240 launched: '/opt/node-debug/bin/node' (arm64)

  Consumer group/Producer
Process 11240 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
   284  
   285    V8_INLINE static Local<T> New(Isolate* isolate,
   286                                  const PersistentBase<T>& that) {
-> 287      return New(isolate, internal::ValueHelper::SlotAsValue<T>(that.val_));
   288    }
   289  
   290    V8_INLINE static Local<T> New(Isolate* isolate,
warning: confluent-kafka-javascript.node was compiled with optimization - stepping may behave oddly; variables may not be available.
(llnode) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function>::New(isolate=0x0000000140008000, that=0x0000000000000000) at v8-local-handle.h:287:68 [opt]
    frame #1: 0x0000000128807018 confluent-kafka-javascript.node`Nan::Callback::Call_(v8::Isolate*, v8::Local<v8::Object>, int, v8::Local<v8::Value>*, Nan::AsyncResource*) const [inlined] v8::Local<v8::Function> Nan::New<v8::Function, v8::NonCopyablePersistentTraits<v8::Function>>(p=0x0000000000000000) at nan_implementation_12_inl.h:422:10 [opt]
    frame #2: 0x0000000128807014 confluent-kafka-javascript.node`Nan::Callback::Call_(this=0x0000000000000000, isolate=<unavailable>, target=(val_ = 0x000000013800aee8), argc=4, argv=0x000000016fdf6968, resource=0x000000016fdf68a0) const at nan.h:1880:36 [opt]
    frame #3: 0x0000000128802fac confluent-kafka-javascript.node`Nan::Callback::Call(this=0x0000000000000000, argc=4, argv=0x000000016fdf6968) const at nan.h:1824:25 [opt]
    frame #4: 0x0000000128825a54 confluent-kafka-javascript.node`NodeKafka::Workers::KafkaConsumerConsumeLoop::HandleMessageCallback(this=0x000000014f708810, msg=0x00006000023a8d80, ec=<unavailable>) at workers.cc:775:13 [opt]
    frame #5: 0x0000000128829e90 confluent-kafka-javascript.node`NodeKafka::Workers::MessageWorker::WorkMessage(this=0x000000014f708810) at workers.h:110:7 [opt]
    frame #6: 0x0000000100b88d08 node`uv__async_io(loop=0x00000001041d4c90, w=<unavailable>, events=<unavailable>) at async.c:176:5 [opt]
    frame #7: 0x0000000100b9b9b0 node`uv__io_poll(loop=0x00000001041d4c90, timeout=0) at kqueue.c:381:9 [opt]
    frame #8: 0x0000000100b89270 node`uv_run(loop=0x00000001041d4c90, mode=UV_RUN_DEFAULT) at core.c:447:5 [opt]
    frame #9: 0x0000000100005818 node`node::SpinEventLoopInternal(env=0x0000000138018c00) at embed_helpers.cc:41:7 [opt]
    frame #10: 0x00000001001347ac node`node::NodeMainInstance::Run(this=<unavailable>, exit_code=0x000000016fdff184, env=0x0000000138018c00) at node_main_instance.cc:124:9 [opt]
    frame #11: 0x00000001001344b8 node`node::NodeMainInstance::Run(this=<unavailable>) at node_main_instance.cc:100:3 [opt]
    frame #12: 0x00000001000b02c4 node`node::Start(int, char**) [inlined] node::StartInternal(argc=<unavailable>, argv=<unavailable>) at node.cc:1502:24 [opt]
    frame #13: 0x00000001000b01d4 node`node::Start(argc=<unavailable>, argv=<unavailable>) at node.cc:1509:27 [opt]
    frame #14: 0x00000001863b4274 dyld`start + 2840

(This even occurs with a fix for #161.)

milindl commented 1 week ago

Thank you for the report. I can't reproduce it, but looking at valgrind and the stacktrace, I think it's the same one as #34 . As far as I understand, it's because we're sending the message from the consume loop thread, to the event loop thread, but by the time it makes to the event loop thread, the callback it's supposed to call is destroyed already.

It seems to be specific to the consume() loop in the non-promisified API.

I'll keep both around until I'm certain that they are the same.