Closed olegans1972 closed 4 years ago
Please follow this guide and obtain a proper corefile and gdb
backtrace. Alternatively, please provide some steps to reproduce this issue and an example opensips.cfg
.
There is literally 0 information to work with right now.
Hi, I saw this guide. I didn't do only one thing "ulimit -c unlimited" Could you explain where I can insert this command if I do not have (init.d) at all? I have Centos7
Just make sure you call ulimit -c unlimited
before the opensips start command (e.g. opensips -f /etc/opensips.cfg -m 256 -M 32
). The initscript would do this exact same thing.
Hi, I saw this guide. I didn't do only one thing "ulimit -c unlimited" Could you explain where I can insert this command if I do not have (init.d) at all? I have Centos7
Try this: https://unix.stackexchange.com/a/345596.
Hi, I saw this guide. I didn't do only one thing "ulimit -c unlimited" Could you explain where I can insert this command if I do not have (init.d) at all? I have Centos7
Try this: https://unix.stackexchange.com/a/345596.
Thank you!
opensips.txt It's my config. As soon as the core file is ready, I'll put it
So, I have 14 core.files. Do you need back traсe earch of them?
First two files:
(gdb) bt full
url_len = 20
index = 1
cb = 0x7f471a75a590
http_root = 0x0
__FUNCTION__ = "get_httpd_cb"
upload_data_size=0x7ffe7d373b90, con_cls=0x2234e78) at httpd_proc.c:647
page = {s = 0x0, len = 0}
response = <optimized out>
ret = <optimized out>
cb = 0x0
normalised_url = <optimized out>
pr = 0x0
kv = <optimized out>
p = <optimized out>
ret_code = 200
sv_sockfd = <optimized out>
addrlen = 16
cl_socket = 0x220c7d0
__FUNCTION__ = "answer_to_connection"
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available.
status = <optimized out>
rs = {__fds_bits = {1024, 0 <repeats 15 times>}}
ws = {__fds_bits = {0 <repeats 16 times>}}
es = {__fds_bits = {0 <repeats 16 times>}}
max = 10
cb = <optimized out>
__FUNCTION__ = "httpd_proc"
tv = {tv_sec = 0, tv_usec = 868327}
saddr_in = {sin_family = 2, sin_port = 47138, sin_addr = {s_addr = 0}, sin_zero = "\000\000\000\000\000\000\000"}
No symbol table info available.
No symbol table info available.
(gdb) bt full
No symbol table info available.
No symbol table info available.
No symbol table info available. (gdb)
(gdb) bt full
No symbol table info available.
No symbol table info available.
No symbol table info available.
No symbol table info available. (gdb)
In core.20429
, could you please provide the output of the following commands:
(gdb) frame 0
(gdb) print *cb
(gdb) print cb->http_root
Thanks!
(gdb) frame 0
177 in httpd_proc.c (gdb)
177 in httpd_proc.c (gdb) print cb $1 = {module = 0x1100000000000000 <Address 0x1100000000000000 out of bounds>, http_root = 0x0, callback = 0xf800000000000000, flush_data_callback = 0x7f472a7667, init_proc_callback = 0x900000000000000, type = HTTPD_STD_CNT_TYPE, next = 0x500007f471adb61} (gdb) print cb->http_root $2 = (str ) 0x0 (gdb)
/var/log/opensips.log
say? Any relevant errors before the crash?opensipsctl fifo get_statistics shmem:
before and during the crash and notice if it gets close to 100% usagevar/log/opensips.log:
Mar 16 15:21:38 kv-spx-1 /usr/sbin/opensips[20450]: failure_route[VMS_FAILOVER],1584372098620709|F693FDD02FD148ED13ED3997@0770ffffffff|Rcv|INVITE|sctp|10.93.137.170:54949|10.161.20.225:5060|sip.from=9647748752;sip.to=9282000026,
Failed FS
We use 256 MB shared memory
This is real_used_size before and after alarm. The alarm was at (15:26(UTC) + 3)
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.
Which update? I attached all the information that was requested from me.
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.
I'm waiting for fix.
Hello @liviuchircu! Tell me please! When can we expect a new release 2.4.8? Is it possible to make some patch for 2.4.7 on this problem?
hey, @olegans1972 ! This is a generic shared memory corruption problem, since your monitoring shows you have enough SHM (the crash takes place below 10% usage). To troubleshoot it, I need to understand what kind of code is called and how the corruption can happen. Please provide the following:
opensips.cfg
file. If privacy is an issue, send it to liviu@opensips.orgopensips-minimal.cfg
, while sending 100 CPS with sipp
, it will crash within 2000 calls.". If you can provide this kind of massive help, the fix will follow up very soon.Many thanks and apologies for the dead time. We are heavily pushing with 3.1 development these days and I didn't prioritize my GitHub issues so much. Getting back to them now :)
Hey @liviuchircu ! I attached my opensips.cfg when I opened this issue) it was 16.03
opensips.txt It's my config. As soon as the core file is ready, I'll put it
HI @liviuchircu ! It is very difficult to catch, to reproduce this situation. I tried to run a typical script on a test system. The call is as follows: SippUAC -> Opensips <---> HttpServerRequest (Responce) -> SippUAS. The script ran with the following parameters 25,000 calls 100 of simultaneous calls and 100 calls per seconds. One call lasts 2 seconds. I see how shared memory grows. If you put more than 30,000 calls, then segafault occurs because shmem memory runs out
It seemed strange to me that 10 minutes after the end of the test, the memory in top is still large. Perhaps my problem is that the memory is gradually accumulating. It is strange that on the prod system I do not see an increase in memory consumption. But on the test this happens. I started my test the RES at TOP was 2244. After test it was 185432, After 30 minutes The RES was almost the same. This is my sipp UAC stress_uac_opensips_no_rtp_cf_1_1_isup.txt
Hello @liviuchircu!
I checked again, the memory does not leak. There was an error in the SIPP script
Cheers, @olegans1972 - do let me know if you find a way to reproduce the crash and I will resume the work here. Otherwise, the opensips.txt
in-depth analysis will have to wait for the beta testing phase of 3.1
, when I will be able to dedicate a good half of day's work for it... unless someone else chimes in and offers to help in the meantime, of course :)
Hi! @liviuchircu! I can’t reproduce the problem on the lab We decided to switch to version 3.0.2. And try to solve all the problems on the new version. I have already opened a task https://github.com/OpenSIPS/opensips/issues/2105
This problem is not related to segfault, but as it seems to us, it could have contributed to it.
This is interesting! So you're saying that rest_client works 100% fine on 2.4.x, with no async(rest_post())
occasional timeouts?
Sometimes I saw the same problem. But at the 2.4.7 we use default timeout 20 sec. Probably the buffer could fill up and this led to segfault so now we tried to limit it with a timer
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.
We have switched on all our servers to version 3.0.2 and continue observations there
Any updates here? No progress has been made in the last 15 days, marking as stale. Will close this issue if no further updates are made in the next 30 days.
Marking as closed due to lack of progress for more than 30 days. If this issue is still relevant, please re-open it with additional details.
OpenSIPS version you are running
version: opensips 2.4.7 (x86_64/linux) flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535 poll method support: poll, epoll, sigio_rt, select. git revision: 597f81b main.c compiled on 10:10:43 Mar 13 2020 with gcc 4.8.5
Crash Core Dump
Describe the traffic that generated the bug
To Reproduce
Relevant System Logs
OS/environment information
Additional context