OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.22k stars 570 forks source link

[CRASH] with rtp_relay module on race condition #3417

Open telematico opened 2 weeks ago

telematico commented 2 weeks ago

OpenSIPS version you are running

OpenSips 3.4.5-1 installed from https://apt.opensips.org on Debian bullseye

Crash Core Dump

https://pastebin.com/kjVFEp7i https://pastebin.com/UUGtPYP2

Describe the traffic that generated the bug I happens after a race condition (200 OK VS CANCEL) when using rtp_relay module

image

When my B2BUA receives the 200 OK (after sending the cancel, of course) It tries to re-invite the call. The dialog in Opensips was ended (because the cancel) and, of course is a bogus situation. I tried delaying dialog deletion using parameters: modparam("dialog", "delete_delay", 10) modparam("dialog", "race_condition_timeout", 5)

but is still the same.

To Reproduce

You need to produce an event in a dialog that is in state 5, this dialog should have rtp_relay module active (I use it with RTPengine)

Relevant System Logs


Jun 20 10:24:48 proxy1 /usr/sbin/opensips[127609]: WARNING:dialog:log_next_state_dlg: bogus event 8 in state 5 for dlg 0x7f6a8c0a6720 [7475:1639102538] with clid '2f3ce88810f521be121810407d93b040@1.1.1.1:5060' and tags 'as67b25d99' 'FBD6B068-211A'
Jun 20 10:24:48 proxy1 /usr/sbin/opensips[127609]: CRITICAL:core:sig_usr: segfault in process pid: 127609, id: 28
Jun 20 10:24:48 proxy1 kernel: [101498.645017] opensips[127609]: segfault at 4 ip 00007f6a88e058b8 sp 00007ffcdb558b70 error 6 in rtp_relay.so[7f6a88dfc000+15000]
Jun 20 10:24:48 proxy1 kernel: [101498.646303] Code: 00 41 55 55 50 48 8d 05 86 cc 00 00 50 48 8b 05 2e b6 01 00 8b 30 31 c0 e8 d5 6b ff ff 4c 89 e4 e9 b9 fd ff ff 0f 1f 44 00 00 <83> 63 04 f7 e9 e1 fd ff ff 0f 1f 80 00 00 00 00 a8 08 0f 84 96 fd
Jun 20 10:24:48 proxy1 /usr/sbin/opensips[127581]: INFO:core:handle_sigs: child process 127609 exited by a signal 11
Jun 20 10:24:48 proxy1 /usr/sbin/opensips[127581]: INFO:core:handle_sigs: core was not generated
Jun 20 10:24:48 proxy1 /usr/sbin/opensips[127581]: INFO:core:handle_sigs: terminating due to SIGCHLD

OS/environment information

Additional context

telematico commented 2 weeks ago

description updated with two similar bt from 2 different core files

telematico commented 2 weeks ago

Tested with last 3.4 revision from git and still have a segfault.

version: opensips 3.4.6 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 26e9a75d2
main.c compiled on  with gcc 10

This is a new paste with the backtrace, because it seems to break freeing resaources, not in the lock. https://pastebin.com/5Qspb2Kz

#7  rtp_copy_ctx_free (copy_ctx=0x0) at rtp_relay_ctx.c:244
        __FUNCTION__ = "\211|$\030D\211t$\020L\215E\004L\211D$\b"
#8  rtp_relay_ctx_free (ctx=0x7f40dc8521c0) at rtp_relay_ctx.c:283
        it = 0x20
        safe = 0x7f40dcdcb0b8
        __FUNCTION__ = "l$@L\211\357\350E&\376\377H\213-F\003\001\000H"

the error seems to be caused in the same situation, after a log message: WARNING:dialog:log_next_state_dlg: bogus event 8 in state 5 for dlg