OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.28k stars 581 forks source link

opensips 2.4 (mobile proxy) crashes #1541

Open ivanoff946 opened 5 years ago

ivanoff946 commented 5 years ago

Hello guys! I use separate opensips for work this softphones on mobile platforms Android and Ios. Sometimes opensips is crashes, in logs i see:

Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: child process 17177 exited by a signal 7
Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: core was generated
Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: terminating due to SIGCHLD
Nov 21 16:29:20 opensips /usr/sbin/opensips[17221]: CRITICAL:core:handle_worker: dead child 25 (EOF received), pid 17177
Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: child process 17177 exited by a signal 7
Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: core was generated
Nov 21 16:29:20 opensips /usr/sbin/opensips[17122]: INFO:core:handle_sigs: terminating due to SIGCHLD
Nov 21 16:29:20 opensips /usr/sbin/opensips[17221]: CRITICAL:core:handle_worker: dead child 25 (EOF received), pid 17177

I have trace and core. I can send core to you in private channel.

opensips -V:

version: opensips 2.4.3 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-FUTEX-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: e258956
main.c compiled on 15:18:30 Nov 20 2018 with gcc 4.8.5

trace-mobile-proxy.txt

Thank you!

bogdan-iancu commented 5 years ago

@ivanoff946, please print the followings in gdb, frame 0 : p *hdr p ((contact_body_t *)hdr->parsed)->contacts

ivanoff946 commented 5 years ago

@bogdan-iancu Sorry, i loose my core file! Soon as possible i attach new trace file. Once again, I'm sorry.

ivanoff946 commented 5 years ago

Hello @bogdan-iancu ! I attach new trace here. It is the same problem. trace3.txt

Thank you!

ivanoff946 commented 5 years ago

p hdr p ((contact_body_t )hdr->parsed)->contacts

for trace3.txt:

(gdb) f 0
#0  0x00007f4034612931 in fake_req (faked_req=0x7f403483b580 <faked_req.12035>, uac=0x0, with_dst=0, uas=0x7f403dfacfd0, shm_msg=0x7f403e0bc3b0) at t_msgbuilder.h:238
238
(gdb) p *hdr
$1 = {type = HDR_CONTACT_T, name = {
    s = 0x7f403e0bccb4 "Contact: <sip:79135951595@1.1.1.11;did=173.6ef77b04>\r\nTo: <sip:user2@400122741.domain.ru>\r\nFrom: \"79135951595\"<sip:79135951595@domain.ru>;tag=a520d934\r\nCall-ID: rQjCm3WUdeSudVPqjoJkAQ..\r\nCS"..., len = 7}, body = {
    s = 0x7f403e0bccbd "<sip:79135951595@1.1.1.11;did=173.6ef77b04>\r\nTo: <sip:user2@400122741.domain.ru>\r\nFrom: \"79135951595\"<sip:79135951595@domain.ru>;tag=a520d934\r\nCall-ID: rQjCm3WUdeSudVPqjoJkAQ..\r\nCSeq: 1 INV"..., len = 46}, len = 57, parsed = 0x0, next = 0x7f403e0bd230, sibling = 0x0}
(gdb) p ((contact_body_t *)hdr->parsed)->contacts
Cannot access memory at address 0x8
(gdb)
bogdan-iancu commented 5 years ago

@ivanoff946 , according to the traces, the crash happens in a notification route (via event based routing), when injecting some new branches - are you doing a notify_on_event("E_UL_AOR_INSERT"..) ? or ?

ivanoff946 commented 5 years ago

Yes, i have this code:

    if ((is_from_gw("9"))||($si == "OPENSIPS_HOST")) {
        xlog("L_INFO", "Call from main opensips");
        $var(search_user) = $tU;
        $var(search_domain) = $td;
        $var(search_type) = "dst";
        route(get_user);        
        route(lookup);
        route(rabbitmq);
        t_newtran();
        t_wait_for_new_branches();
        $avp(filter) = "aor=" + $tU + "@" + $td;
        notify_on_event("E_UL_CONTACT_INSERT","$avp(filter)","INSERT_CT","120");
        route(RELAY);
ivanoff946 commented 5 years ago

Hello @bogdan-iancu! I think i understood the meaning of the problem. I use this route with notify_on_event (from above code):

route[INSERT_CT] {
    xlog("L_INFO", "[INSERT_CT] New contact [$avp(uri)] registered for user [$avp(aor)], inserting new branch to ongoing call\r\n");
    usleep("1000");
    t_inject_branches("event");
}

Opensips crashes, when one worker calls t_inject_branches("event"); repeatedly for same "Contact". For example, logs:

Dec 18 09:40:41 opensips /usr/sbin/opensips[4196]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4201]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4196]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4201]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4196]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4201]: [INSERT_CT] New contact [sip:v0011@1.1.188.12:38909;ob] registered for user [v0011@example.domain.com], inserting new branch to ongoing call
Dec 18 09:40:41 opensips /usr/sbin/opensips[4245]: CRITICAL:core:handle_worker: dead child 23 (EOF received), pid 4196
Dec 18 09:40:41 opensips /usr/sbin/opensips[4140]: INFO:core:handle_sigs: child process 4196 exited by a signal 7
Dec 18 09:40:41 opensips /usr/sbin/opensips[4140]: INFO:core:handle_sigs: core was not generated
Dec 18 09:40:41 opensips /usr/sbin/opensips[4140]: INFO:core:handle_sigs: terminating due to SIGCHLD

Here worker with pid 4196 calls t_inject_branches("event"); for same contact twice and after that worker crashes. I have statistics from many crashing, and everywhere i see that one worker twice calls t_inject_branches("event"); for same contact and after that worker is crashes.

achalkov commented 5 years ago

Question is - why E_UL_CONTACT_INSERT raises multiple times on same contact. As said in documentation - "This event is raised when a new contact is inserted in any of the existing AOR's contact list.", so imo it should raises once per new contact insertion.

bogdan-iancu commented 5 years ago

Maybe the event is raised only once, but there are multiple subscribers for it (due notify_on_event on different calls). Keep in mind that the subscription does not "die" when the corresponding transaction is ended - the subscription to events are transaction independent, they have their lifetime dictated only by the expire value. So, the even will still trigger the notification (even if the transaction was terminated), but the t_inject_branch() should fail as (1) it does not find the transaction or (2) the found transaction does not accept new branches.

bogdan-iancu commented 5 years ago

Collecting and compiling information from other reports, we narrow down the possible causes of such crashes to a race condition (on operating over the msg save in shmem in transaction) between"

(a) the process doing the cleanup after an async or notify_on_event call and (b) another process building a fake request due running a failure route or inject_branch