MayamaTakeshi / sip-lab

A node module that helps to write SIP functional tests
3 stars 2 forks source link

Occasional crash when executing samples/mrcp_and_audio.js #49

Closed MayamaTakeshi closed 6 months ago

MayamaTakeshi commented 1 year ago
12:07:26.558  Match successful
12:07:26.558  
12:07:26.558  wait (mrcp_and_audio.js:224) got expected event:
12:07:26.558    {
    event: 'mrcp_msg',
    call_id: 0,
    msg: 'MRCP/2.0 87 SPEAK-COMPLETE 1 COMPLETE\r\n' +
  'channel-identifier: 32AECB234338@speechsynth\r\n' +
  '\r\n'
  }
12:07:26.558  All expected events received
12:07:26.558  wait (mrcp_and_audio.js:224) finished
12:07:26.558  All expected events received
12:07:26.558  wait (mrcp_and_audio.js:224) finished
pjw_call_reinvite call_id=0

process_media call_id=0
itr is object
itr is object
i=0 media found

i=1 media found

process_media call_id=0 call->local_sdp: v=0
o=- 3888011246 3888011246 IN IP4 0.0.0.0
s=pjmedia
t=0 0
m=application 9 TCP/MRCPv2 1
c=IN IP4 127.0.0.1
a=setup:active
a=connection:new
a=resource:speechsynth
a=cmid:1
m=audio 10000 RTP/AVP 0 8 3 96 97 98 99 9 18 120 121 122
c=IN IP4 127.0.0.1
b=TIAS:64000
a=rtcp:10001 IN IP4 127.0.0.1
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:3 GSM/8000
a=rtpmap:96 iLBC/8000
a=fmtp:96 mode=30
a=rtpmap:97 speex/32000
a=rtpmap:98 speex/16000
a=rtpmap:99 speex/8000
a=rtpmap:9 G722/8000
a=rtpmap:18 G729/8000
a=rtpmap:120 telephone-event/8000
a=fmtp:120 0-16
a=rtpmap:121 telephone-event/32000
a=fmtp:121 0-16
a=rtpmap:122 telephone-event/16000
a=fmtp:122 0-16
a=recvonly
a=mid:1

corrupted size vs. prev_size
./runtests: line 3: 33334 Aborted                 (core dumped) node $i
samples/mrcp_and_audio.js failed
takeshi@takeshi-desktop:~/src/git/MayamaTakeshi/sip-lab$ 
MayamaTakeshi commented 1 year ago

Of more than a thousand test executions, only the ones for mrcp_and_audio.js (which is the only one that writes/reads TCP data) got crashes:

takeshi@takeshi-desktop:~/src/git/MayamaTakeshi/sip-lab$ cat res.txt | grep 0 | wc -l
1175

takeshi@takeshi-desktop:~/src/git/MayamaTakeshi/sip-lab$ cat res.txt | grep 0 -v | wc -l
20

takeshi@takeshi-desktop:~/src/git/MayamaTakeshi/sip-lab$ cat res.txt | grep -v 0 | sort
samples/mrcp_and_audio.js: 134
samples/mrcp_and_audio.js: 134
samples/mrcp_and_audio.js: 134
samples/mrcp_and_audio.js: 134
samples/mrcp_and_audio.js: 134
samples/mrcp_and_audio.js: 139
samples/mrcp_and_audio.js: 139
samples/mrcp_and_audio.js: 139
samples/options.js: 1
samples/options.js: 1
samples/options.js: 1
samples/options.js: 1
samples/options.js: 1
samples/register_subscribe.js: 1
samples/register_subscribe.js: 1
samples/register_subscribe.js: 1
samples/register_subscribe.js: 1
samples/register_subscribe.js: 1
samples/register_subscribe.js: 1
samples/two_audio_media.js: 1

So indeed there is an implementation bug.

MayamaTakeshi commented 1 year ago

Testing client and sides separately:

Testing speech synth client against mrcp_server get crashes:

07:42:22.671  
07:42:22.671  wait (b.js:127) got event:
07:42:22.671    {
    event: 'mrcp_msg',
    call_id: 0,
    msg: 'MRCP/2.0 142 SPEAK-COMPLETE 1 COMPLETE\r\n' +
  'channel-identifier: dc7318a7-d51a-46f0-af98-aa893c351428@speechsynth\r\n' +
  'Completion-Cause: 000 normal\r\n' +
  '\r\n'
  }
07:42:22.671  
07:42:22.671  Trying match against expected_events[0]:
07:42:22.672    partial_match({
    event: 'mrcp_msg',
    call_id: 0,
    msg: mrcp_msg({
      type: 'event',
      version: '2.0',
      request_id: 1,
      event_name: 'SPEAK-COMPLETE',
      request_state: 'COMPLETE'
    })
  })
malloc_consolidate(): invalid chunk size
./r2.sh: line 10: 13312 Aborted                 (core dumped) node b.js

Testing speech synth server with calls generated by mrcp_client also get crashes:

channel-identifier: 32AECB234338@speechsynth
speech-language: dtmf
content-type: text/plain
content-length: 16

1234567890abcd*#
on_data_read msg_size=155
on_data_read msg={"event": "mrcp_msg", "call_id": 10}
MRCP/2.0 155 SPEAK 1
channel-identifier: 32AECB234338@speechsynth
speech-language: dtmf
content-type: text/plain
content-length: 16

1234567890abcd*#
dispach_event called with evt={"event": "mrcp_msg", "call_id": 10}
MRCP/2.0 155 SPEAK 1
channel-identifier: 32AECB234338@speechsynth
speech-language: dtmf
content-type: text/plain
content-length: 16

1234567890abcd*#

09:22:18.639  
09:22:18.639  wait (c.js:101) got event:
09:22:18.639    {
    event: 'mrcp_msg',
    call_id: 10,
    msg: 'MRCP/2.0 155 SPEAK 1\r\n' +
  'channel-identifier: 32AECB234338@speechsynth\r\n' +
  'speech-language: dtmf\r\n' +
  'content-type: text/plain\r\n' +
  'content-length: 16\r\n' +
  '\r\n' +
  '1234567890abcd*#'
  }
09:22:18.640  
09:22:18.640  Trying match against expected_events[0]:
Segmentation fault (core dumped)
MayamaTakeshi commented 1 year ago

We are seeing crashes also in node when running the mrcp test:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f41de4a547e in __GI___libc_free (mem=0x400000000000) at ./malloc/malloc.c:3368
3368    ./malloc/malloc.c: No such file or directory.
[Current thread is 1 (Thread 0x7f41dea497c0 (LWP 53432))]
(gdb) bt
#0  0x00007f41de4a547e in __GI___libc_free (mem=0x400000000000) at ./malloc/malloc.c:3368
#1  0x00007f41deaa9aae in free (ptr=<optimized out>) at ../include/rtld-malloc.h:50
#2  __GI__dl_deallocate_tls (tcb=tcb@entry=0x7f41d5832640, dealloc_tcb=dealloc_tcb@entry=false) at ../elf/dl-tls.c:646
#3  0x00007f41de4914c5 in __nptl_free_stacks (limit=41943040) at ./nptl/nptl-stack.c:78
#4  0x00007f41de491625 in queue_stack (stack=0x7f41dcbfc640) at ./nptl/nptl-stack.c:106
#5  __GI___nptl_deallocate_stack (pd=pd@entry=0x7f41dcbfc640) at ./nptl/nptl-stack.c:123
#6  0x00007f41de4917fd in __GI___nptl_free_tcb (pd=0x7f41dcbfc640) at ./nptl/nptl_free_tcb.c:41
#7  0x00007f41de49661c in __pthread_clockjoin_ex (threadid=139920853157440, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true)
    at ./nptl/pthread_join_common.c:128
#8  0x00007f41de496543 in ___pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at ./nptl/pthread_join.c:24
#9  0x000000000165b25e in uv_thread_join (tid=<optimized out>) at ../deps/uv/src/unix/thread.c:273
#10 0x0000000000bd8b49 in node::WorkerThreadsTaskRunner::Shutdown() ()
#11 0x0000000000bd8ef8 in node::NodePlatform::Shutdown() ()
#12 0x0000000000ab196a in node::DefaultProcessExitHandler(node::Environment*, int) ()
#13 0x0000000000af1733 in node::Environment::Exit(int) ()
#14 0x0000000000da5fa0 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#15 0x0000000000da74df in v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) ()
#16 0x00000000016e9af9 in Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit ()
#17 0x000000000166da50 in Builtins_InterpreterEntryTrampoline ()
#18 0x0000060a671415b9 in ?? ()
#19 0x000020282f1860d9 in ?? ()
#20 0x0000000600000000 in ?? ()
#21 0x0000060a67141689 in ?? ()
#22 0x000020282f1827d9 in ?? ()
#23 0x0000000000000000 in ?? ()
(gdb) 

But this cannot be reported as they say (in issue-34174):

If find node_modules/ -name \*.node finds any files, chances are good the bug stems from one of those. That first stack trace is most likely caused by an earlier memory corruption and native modules are usually the source of those.
MayamaTakeshi commented 1 year ago

I tried to use send_key as part of MrcpEndpoint struct and allocate memory for the data from g_pool to make sure it doesn't happen because they are release too soon. But the crashes continue to happen.

I also tried a 1 second delay after answer before sending the first MRCP message but the crash persists.

MayamaTakeshi commented 1 year ago

We should isolate the activesock code in a main.c file and see this reproducible without node.