Closed jyotigroomefy closed 1 year ago
what version of drachtio server are you running? If not the latest, could you try with 0.8.20 and let me know if the server still crashes in this instance? If it does, I would appreciate seeing the drachtio server logs with sofia-loglevel=4
Thanks for your prompt response Dave.
I have collected the logs. I have attached here. I am using Drachtio version 0.8.20
Thanks this log is useful. Reviewing it now. How easy is this for you to recreate?
filtered5.txt Here is one more log file for above crashes.
Thanks. Again -- is this easy for you to recreate? If so, I would like to ask your help in testing a fix
Would it be possible for you to test with v0.8.21-rc1 and to forward the logs again when you recreate? I have added some logging that I hope will help me troubleshoot this further
Also, would it possible to get more history leading up to the crash? I would like to be able to see the wss connection from when it is first made, and the logs so far all start after the client has established the wss socket connection. In any case, please just provide as much log history as possible leading up to the seg fault.
we are testing the latest one. we will share logs if we found seg fault issue.
ok great.
Hi we have tested latest one v0.8.21-rc1. We have found same crashes. Sharing logs here.
I looked over the logs but I need more history. I have to be able to track the entire history of when this specific client connected to drachtio. This log covers the period 2023-01-12 17:03:09 - 2023-01-12 17:05:15. Thats only a few minutes. Have you got the logs just preceeding this time peroid? If so please provide.
In this log, I can see the issue has to do with reference counting, but the reference counts for this connection are already off by one at the start of the log, so the issue I need to investigate and fix happened at an earlier time, The specific client in this case was connecting from wss/192.168.1.125:21268 and I need to go back to the log where he first connected
I really want to fix this bug but I need your help to preserve and provide all of the logs for a much longer period of time,
by the way we can leave the issue open here, but it is more properly a drachtio server bug
any chance you've been able to make this happen again, and get more log history @jyotigroomefy ?
Yes. i am able to take full log history. Please find in the attachment.
Excellent thanks, will have a look
Your code is doing one thing wrong which needs to be corrected. You are responding to an incoming REGISTER with a 202 Accepted. This is not correct, a success response to a REGISTER is a 200 OK, not a 202. (A 202 is the success response for a SUBSCRIBE).
2023-01-21T06:46:55.28341336Z stdout F [0] SIP/2.0 202 Accepted
2023-01-21T06:46:55.283418339Z stdout F [0] Via: SIP/2.0/WSS consort.org;branch=z9hG4bK1356925;received=192.168.1.125;rport=22661
2023-01-21T06:46:55.283423296Z stdout F [0] From: <sip:mcx.cons-857.cons@consort.org>;tag=9f2pefdtd3
2023-01-21T06:46:55.283428524Z stdout F [0] To: <sip:mcx.cons-857.cons@consort.org>;tag=ea4vQS96QNZ7D
2023-01-21T06:46:55.2834335Z stdout F [0] Call-ID: rljqtset8cartjdbv5f6r1
2023-01-21T06:46:55.283438734Z stdout F [0] CSeq: 7942 REGISTER
2023-01-21T06:46:55.283453612Z stdout F [0] Contact: <sip:mcx.cons-857.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-21T06:46:55.283460235Z stdout F [0] Expires: 0
2023-01-21T06:46:55.283465531Z stdout F [0] Content-Length: 0
2023-01-21T06:46:55.283470693Z stdout F [0]
2023-01-21T06:46:55.283475785Z stdout F [0] 2023-01-21 12:16:55.275963 tport.c:2345 tport_set_secondary_timer() tport(0x563333420000): reset timer
Could you please change your code to respond 200 OK when you want to respond with a success result to a REGISTER. Also please upgrade to drachtio-server version v0.8.21-rc3. After making both changes please retest and let me know the results.
v0.8.21-rc3 has a fix that may resolve this issue (but it also requires that you properly respond 200 instead of 202 to successful register requests)
thank you :) I will change response 200 OK. I will use v0.8.21-rc3 and let you know the results.
Hi we have changed response to 200 OK and used latest version in our application. We have tested it. now frequently crashes is not happening but got same crash yesterday. i have attached log file here.
We have gone through your code and noticed one thing in "sip-dialog-controller.cpp" file. 'pData" pointer variable is redeclared same as argument variable. i have attached code snapshot here.
Thanks I will take a look at your log.
As far as that statement defining a local var named pData, that is not a problem. It is not re-declaring an existing variable it is creating a new variable of the same name so in this scope that variable is what is used when we refer to pData
you are still incorrectly sending 202 responses to REGISTER requests. Could you fix that, and provide a new log if you still experience crashes?
2023-01-25T13:57:19.295093506Z stdout F [0] 2023-01-25 19:27:19.294716 Client::read_handler read: c6d15884-7839-4f24-9883-67c92a303060|sip|73aa8742-80d4-4848-98a7-ce183f85b68f|
2023-01-25T13:57:19.295110268Z stdout F [0] SIP/2.0 202 Accepted
2023-01-25T13:57:19.295119837Z stdout F [0] Call-ID: idc38nsv5vb7096e9v5pv6
2023-01-25T13:57:19.295128731Z stdout F [0] cseq: 904 REGISTER
2023-01-25T13:57:19.295145116Z stdout F [0] from: <sip:mcx.cons-487.cons@consort.org>;tag=i175v71a1f
2023-01-25T13:57:19.295154496Z stdout F [0] to: <sip:mcx.cons-487.cons@consort.org>
2023-01-25T13:57:19.295163473Z stdout F [0] Content-Length: 0
2023-01-25T13:57:19.295172128Z stdout F [0] contact: <sip:mcx.cons-487.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-25T13:57:19.295181279Z stdout F [0] expires: 0
2023-01-25T13:57:19.2951902Z stdout F [0]
2023-01-25T13:57:19.295198971Z stdout F [0]
can you give me an updated status on this? Are you having any issues after the change above?
I am really sorry for sending logs without changing 202 status code. We have now fixed 202 status code responses to REGISTER requests and good thing we have not found any issue for last 3 days. we are continue doing testing. I will update here if we found any issue. Thank you for helping us :)
no worries at all. Please let me know how it goes, whether you experience any crashes or hangs, and also how memory utilization looks over time
@jyotigroomefy can you give me an update on where this stands? Is your issue fixed or is it still occurring?
closed do to lack of response; presumed fixed.
@davehorton it's working fine. Our issue is fixed. It hasn't occur again. Thank you :)
We have an application that uses drachtio srf to connect to the drachtio server. We have situation when the TCP/WS connection of SIP client to Drachtio server is broken but it appears that there is no way drachtio server communicate that connection loss to the drachtio srf client. After this we have dialogs with disconnected sip client alive the in srf client . Now when we send sip message using these dialogs then drachtio server crashes with segmentation fault. If there is way to get destroy events with all the dialogs associated with the disconnected TCP/WS connections, it would be very helpful for application using drachtio srf.