drachtio / drachtio-srf

drachtio signaling resource framework
https://drachtio.org
MIT License
171 stars 60 forks source link

How can Srf client know about the loss of TCP/WS connection from SIP client to Drachtio server #138

Closed jyotigroomefy closed 1 year ago

jyotigroomefy commented 1 year ago

We have an application that uses drachtio srf to connect to the drachtio server. We have situation when the TCP/WS connection of SIP client to Drachtio server is broken but it appears that there is no way drachtio server communicate that connection loss to the drachtio srf client. After this we have dialogs with disconnected sip client alive the in srf client . Now when we send sip message using these dialogs then drachtio server crashes with segmentation fault. If there is way to get destroy events with all the dialogs associated with the disconnected TCP/WS connections, it would be very helpful for application using drachtio srf.

davehorton commented 1 year ago

what version of drachtio server are you running? If not the latest, could you try with 0.8.20 and let me know if the server still crashes in this instance? If it does, I would appreciate seeing the drachtio server logs with sofia-loglevel=4

jyotigroomefy commented 1 year ago

Thanks for your prompt response Dave.

I have collected the logs. I have attached here. I am using Drachtio version 0.8.20

filtered4.txt filtered2.txt

davehorton commented 1 year ago

Thanks this log is useful. Reviewing it now. How easy is this for you to recreate?

jyotigroomefy commented 1 year ago

filtered5.txt Here is one more log file for above crashes.

davehorton commented 1 year ago

Thanks. Again -- is this easy for you to recreate? If so, I would like to ask your help in testing a fix

davehorton commented 1 year ago

Would it be possible for you to test with v0.8.21-rc1 and to forward the logs again when you recreate? I have added some logging that I hope will help me troubleshoot this further

davehorton commented 1 year ago

Also, would it possible to get more history leading up to the crash? I would like to be able to see the wss connection from when it is first made, and the logs so far all start after the client has established the wss socket connection. In any case, please just provide as much log history as possible leading up to the seg fault.

jyotigroomefy commented 1 year ago

we are testing the latest one. we will share logs if we found seg fault issue.

davehorton commented 1 year ago

ok great.

jyotigroomefy commented 1 year ago

Hi we have tested latest one v0.8.21-rc1. We have found same crashes. Sharing logs here.

filtered1.txt

davehorton commented 1 year ago

I looked over the logs but I need more history. I have to be able to track the entire history of when this specific client connected to drachtio. This log covers the period 2023-01-12 17:03:09 - 2023-01-12 17:05:15. Thats only a few minutes. Have you got the logs just preceeding this time peroid? If so please provide.

In this log, I can see the issue has to do with reference counting, but the reference counts for this connection are already off by one at the start of the log, so the issue I need to investigate and fix happened at an earlier time, The specific client in this case was connecting from wss/192.168.1.125:21268 and I need to go back to the log where he first connected

I really want to fix this bug but I need your help to preserve and provide all of the logs for a much longer period of time,

davehorton commented 1 year ago

by the way we can leave the issue open here, but it is more properly a drachtio server bug

davehorton commented 1 year ago

any chance you've been able to make this happen again, and get more log history @jyotigroomefy ?

jyotigroomefy commented 1 year ago

Yes. i am able to take full log history. Please find in the attachment.

drachtioCompleteFiltered.txt

davehorton commented 1 year ago

Excellent thanks, will have a look

davehorton commented 1 year ago

Your code is doing one thing wrong which needs to be corrected. You are responding to an incoming REGISTER with a 202 Accepted. This is not correct, a success response to a REGISTER is a 200 OK, not a 202. (A 202 is the success response for a SUBSCRIBE).

2023-01-21T06:46:55.28341336Z stdout F [0] SIP/2.0 202 Accepted
2023-01-21T06:46:55.283418339Z stdout F [0] Via: SIP/2.0/WSS consort.org;branch=z9hG4bK1356925;received=192.168.1.125;rport=22661
2023-01-21T06:46:55.283423296Z stdout F [0] From: <sip:mcx.cons-857.cons@consort.org>;tag=9f2pefdtd3
2023-01-21T06:46:55.283428524Z stdout F [0] To: <sip:mcx.cons-857.cons@consort.org>;tag=ea4vQS96QNZ7D
2023-01-21T06:46:55.2834335Z stdout F [0] Call-ID: rljqtset8cartjdbv5f6r1
2023-01-21T06:46:55.283438734Z stdout F [0] CSeq: 7942 REGISTER
2023-01-21T06:46:55.283453612Z stdout F [0] Contact: <sip:mcx.cons-857.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-21T06:46:55.283460235Z stdout F [0] Expires: 0
2023-01-21T06:46:55.283465531Z stdout F [0] Content-Length: 0
2023-01-21T06:46:55.283470693Z stdout F [0]  
2023-01-21T06:46:55.283475785Z stdout F [0] 2023-01-21 12:16:55.275963 tport.c:2345 tport_set_secondary_timer() tport(0x563333420000): reset timer

Could you please change your code to respond 200 OK when you want to respond with a success result to a REGISTER. Also please upgrade to drachtio-server version v0.8.21-rc3. After making both changes please retest and let me know the results.

v0.8.21-rc3 has a fix that may resolve this issue (but it also requires that you properly respond 200 instead of 202 to successful register requests)

jyotigroomefy commented 1 year ago

thank you :) I will change response 200 OK. I will use v0.8.21-rc3 and let you know the results.

jyotigroomefy commented 1 year ago

Hi we have changed response to 200 OK and used latest version in our application. We have tested it. now frequently crashes is not happening but got same crash yesterday. i have attached log file here.

We have gone through your code and noticed one thing in "sip-dialog-controller.cpp" file. 'pData" pointer variable is redeclared same as argument variable. i have attached code snapshot here. PHOTO-2023-01-27-11-52-09 2 PHOTO-2023-01-27-11-52-09

drachtio_log.txt

davehorton commented 1 year ago

Thanks I will take a look at your log.

As far as that statement defining a local var named pData, that is not a problem. It is not re-declaring an existing variable it is creating a new variable of the same name so in this scope that variable is what is used when we refer to pData

davehorton commented 1 year ago

you are still incorrectly sending 202 responses to REGISTER requests. Could you fix that, and provide a new log if you still experience crashes?

2023-01-25T13:57:19.295093506Z stdout F [0] 2023-01-25 19:27:19.294716 Client::read_handler read: c6d15884-7839-4f24-9883-67c92a303060|sip|73aa8742-80d4-4848-98a7-ce183f85b68f|
2023-01-25T13:57:19.295110268Z stdout F [0] SIP/2.0 202 Accepted
2023-01-25T13:57:19.295119837Z stdout F [0] Call-ID: idc38nsv5vb7096e9v5pv6
2023-01-25T13:57:19.295128731Z stdout F [0] cseq: 904 REGISTER
2023-01-25T13:57:19.295145116Z stdout F [0] from: <sip:mcx.cons-487.cons@consort.org>;tag=i175v71a1f
2023-01-25T13:57:19.295154496Z stdout F [0] to: <sip:mcx.cons-487.cons@consort.org>
2023-01-25T13:57:19.295163473Z stdout F [0] Content-Length: 0
2023-01-25T13:57:19.295172128Z stdout F [0] contact: <sip:mcx.cons-487.cons@consort.org>;+g.3gpp.icsi-ref="urn%3Aurn-7%3A3gpp-service.ims.icsi.mcptt";+g.3gpp.mcptt
2023-01-25T13:57:19.295181279Z stdout F [0] expires: 0
2023-01-25T13:57:19.2951902Z stdout F [0] 
2023-01-25T13:57:19.295198971Z stdout F [0] 
davehorton commented 1 year ago

can you give me an updated status on this? Are you having any issues after the change above?

jyotigroomefy commented 1 year ago

I am really sorry for sending logs without changing 202 status code. We have now fixed 202 status code responses to REGISTER requests and good thing we have not found any issue for last 3 days. we are continue doing testing. I will update here if we found any issue. Thank you for helping us :)

davehorton commented 1 year ago

no worries at all. Please let me know how it goes, whether you experience any crashes or hangs, and also how memory utilization looks over time

davehorton commented 1 year ago

@jyotigroomefy can you give me an update on where this stands? Is your issue fixed or is it still occurring?

davehorton commented 1 year ago

closed do to lack of response; presumed fixed.

jyotigroomefy commented 1 year ago

@davehorton it's working fine. Our issue is fixed. It hasn't occur again. Thank you :)