Closed Fernandojdk closed 3 years ago
If helps, this is output of dmesg:
[101724.308214] traps: opensips[8325] general protection ip:7f2b11444079 sp:7ffc9205d150 error:0 in libc-2.17.so[7f2b113f7000+1c3000]
[101735.792254] opensips[8294]: segfault at 8 ip 00007f2abb8d4802 sp 00007ffc9205e7e0 error 4 in dialog.so[7f2abb8a5000+7d000]
[108871.033118] opensips[25832]: segfault at 6a ip 00000000005362b8 sp 00007ffd6653b2d0 error 4 in opensips[400000+1fc000]
[109361.785576] opensips[12659]: segfault at 60 ip 00000000005362b8 sp 00007ffce7a77910 error 4 in opensips[400000+1fc000]
[109376.395498] opensips[12632]: segfault at 8 ip 00007f3337834802 sp 00007ffce7a77c60 error 4 in dialog.so[7f3337805000+7d000]
[113014.864189] opensips[16441]: segfault at 8 ip 00007f0af8f94802 sp 00007ffda2f6fbd0 error 4 in dialog.so[7f0af8f65000+7d000]
[113168.666018] opensips[7798]: segfault at 2d ip 00007f3f2d6c3079 sp 00007ffe07d171c0 error 4 in libc-2.17.so[7f3f2d676000+1c3000]
[113181.951784] opensips[7776]: segfault at 8 ip 00007f3ed7b53802 sp 00007ffe07d17d30 error 4 in dialog.so[7f3ed7b24000+7d000]
[114289.520314] traps: opensips[9156] general protection ip:5362b8 sp:7ffeadf63630 error:0 in opensips[400000+1fc000]
[114304.493978] opensips[9114]: segfault at 8 ip 00007f3654bff802 sp 00007ffeadf63980 error 4 in dialog.so[7f3654bd0000+7d000]
[114562.636829] traps: opensips[16966] general protection ip:5362b8 sp:7ffe2a4e62f0 error:0 in opensips[400000+1fc000]
[114577.568572] opensips[16955]: segfault at 8 ip 00007f214d6a0802 sp 00007ffe2a4e6640 error 4 in dialog.so[7f214d671000+7d000]
[115892.560088] opensips[18215]: segfault at 8 ip 00007f08e5a8d802 sp 00007ffee035f7d0 error 4 in dialog.so[7f08e5a5e000+7d000]
[116393.926393] traps: opensips[26137] general protection ip:5362b8 sp:7ffebafd10f0 error:0 in opensips[400000+1fc000]
[116412.988123] opensips[26129]: segfault at 8 ip 00007f773acde802 sp 00007ffebafd1440 error 4 in dialog.so[7f773acaf000+7d000]
[117852.971750] traps: opensips[28657] general protection ip:5362b8 sp:7fff6e4df780 error:0 in opensips[400000+1fc000]
[117869.847650] opensips[28627]: segfault at 8 ip 00007f696198e802 sp 00007fff6e4dfad0 error 4 in dialog.so[7f696195f000+7d000]
[118512.482171] traps: opensips[5905] general protection ip:4e1e78 sp:7ffef66711e0 error:0 in opensips[400000+1fc000]
[118532.311949] opensips[5884]: segfault at 8 ip 00007fa2cf982802 sp 00007ffef66724e0 error 4 in dialog.so[7fa2cf953000+7d000]
@bogdan-iancu Any instruction to deal with this issue? This is happening a lot at day, on different production environment and different hardware. I tried to look at source code to see who is trying to free a contact_t on the same time but with no success. If you give some direction what to look, i'll try to fix this and make a PR.
Thanks in advance.
This is similar to #2095 and I'm chasing a possible race between a 200 OK and Cancel, race that may lead to a corruption of the SIP request cloned into shm.
I will close this one as duplicate and continue on 2095 as older - please monitor that one
OpenSIPS version you are running
Installed via CentOS rpms - Latest version
Crash Core Dump https://pastebin.com/YtDVEmUE (core for error in free_contacts) https://pastebin.com/g1f8zAs2 (core generated together with error on _unref_dlg)
Describe the traffic that generated the bug Its a production server. The crash happens randomly several times a day. It only happens when it has high traffic and opensips load-all reaches >= 90%. More specifically, the crash occurs when reaches 200 CPS (calls per second) or more Whenever the crash happens it is in the same function: free_contacts Every time the crash occurs, two coredump files of same size are generated. One core dump have segfault on tm free_contacts function and the second core dump have segfault on dialogs _unref_dlg function. I posted a link of
bt full
for each core dump.Opensips have a lot of free SHM and a lot of free PKG Mem.
To Reproduce
It is not possible to reproduce because it only happens with high traffic.
Relevant System Logs
OS/environment information
Additional context
I use dialog + topology_hiding (force_dialog = 1) In branch_route I use uac_replace_from and uac_replace_to I use nathelper I'm using rtpengine for some calls. In branch_route i do nat_uac_test(1) and fix_nated_contact() before calling rtpngine_offer For some calls, i'm using SST Just proto_udp is loaded Opensips udp_workers is 32