drachtio / drachtio-server

A SIP call processing server that can be controlled via nodejs applications
https://drachtio.org
MIT License
233 stars 90 forks source link

18x provisional response in re-INVITE breaks signalling #350

Open danvirsen opened 3 months ago

danvirsen commented 3 months ago

When a re-INVITE gets a 18x provisional response, Drachtio doesn't accept the 200 OK when it arrives with the message: unknown response (possibly late arriving?) - discarding. This causes the client to retransmit the 200 OK until it times out.
Everything seems to work fine if the client only send a 100 Trying, but we have clients that send either 180 or 183 before their 200.

The only way I could test this locally was by modifying a WebRTC client so the test case isn't exactly like it was reported to me but I still got the error so I think it's good.

Log of the scenario starting from the incoming 183 Session Progress response to the re-INVITE (Drachtio log set to debug and Sofia to 9):

2024-04-26 10:45:36.058628 recv 350 bytes from wss/[1.2.3.4]:53279 at 10:45:36.058551:
SIP/2.0 183 Session Progress
Via: SIP/2.0/WSS 9.9.9.9:3447;branch=z9hG4bKF0548QF9aXNFg
From: <sip:tqd@3.4.5.6>;tag=FZ49vQDZt34vB
To: "user.1" <sip:user.1@2.3.4.5>;tag=8tmkdjgq5e
CSeq: 32 INVITE
Call-ID: 8rg6rucld80evfp4prhl
Supported: outbound
Content-Length: 0

2024-04-26 10:45:36.058647 tport.c:3064 tport_deliver() tport_deliver(0x56250b26c330): msg 0x56250b2e2210 (350 bytes) from wss/1.2.3.4:53279 next=(nil)
2024-04-26 10:45:36.058657 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 35
2024-04-26 10:45:36.058668 nta.c:3469 agent_recv_response() nta: received 183 Session Progress for INVITE (32)
2024-04-26 10:45:36.058680 nta.c:3536 agent_recv_response() nta: 183 Session Progress is going to a transaction
2024-04-26 10:45:36.058690 tport.c:4277 tport_release() tport_release(0x56250b26c330): 0x56250b2e28f0 by 0x56250b2e3050 with 0x56250b2e2210 (preliminary)
2024-04-26 10:45:36.058700 SipDialogController::processResponseInsideDialog:
2024-04-26 10:45:36.058711 SipDialogController::findRIPByOrq orq 0x56250b2e3050
2024-04-26 10:45:36.058744 SipDialogController::processResponseInsideDialog: found request for response
2024-04-26 10:45:36.058764 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 36
2024-04-26 10:45:36.058777 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 35
2024-04-26 10:45:36.058799 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 36
2024-04-26 10:45:36.058808 SipDialogController::clearRIP clearing orq 0x56250b2e3050
2024-04-26 10:45:36.058819 nta.c:8914 outgoing_free() nta: outgoing_free(0x56250b2e3050)
2024-04-26 10:45:36.058832 tport.c:4277 tport_release() tport_release(0x56250b26c330): 0x56250b2e28f0 by 0x56250b2e3050 with (nil)
2024-04-26 10:45:36.058842 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 35
2024-04-26 10:45:36.058861 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 34
2024-04-26 10:45:36.058873 tport.c:2324 tport_set_secondary_timer() tport(0x56250b26c330): reset timer
2024-04-26 10:45:36.058963 Client::write_handler - wrote 526 bytes: system:0
2024-04-26 10:45:37.314202 nta.c:7302 _nta_incoming_timer() nta: timer I fired, terminate 200 response
2024-04-26 10:45:37.314267 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 33
2024-04-26 10:45:37.314282 nta.c:5994 incoming_reclaim_queued() incoming_reclaim_all((nil), (nil), 0x7ffe92fb0d60)
2024-04-26 10:45:37.314308 nta.c:7356 _nta_incoming_timer() nta_incoming_timer: 0/0 resent, 0/0 tout, 1/1 term, 1/2 free
2024-04-26 10:45:37.314319 nta.c:1367 agent_timer() nta: timer set next to 26996 ms
2024-04-26 10:45:37.402543 tport.c:2814 tport_wakeup() tport_wakeup(0x56250b26c330): events IN
2024-04-26 10:45:37.402606 tport.c:2905 tport_recv_event() tport_recv_event(0x56250b26c330)
2024-04-26 10:45:37.402647 ws.c:367 ws_raw_read() ws_raw_read(0x56250b26c520): SSL_read returned 9, blocking 0
2024-04-26 10:45:37.402662 ws.c:421 ws_raw_read() ws_raw_read(0x56250b26c520): returning 9 wsh->x: 1
2024-04-26 10:45:37.402675 ws.c:367 ws_raw_read() ws_raw_read(0x56250b26c520): SSL_read returned 1, blocking 1
2024-04-26 10:45:37.402687 ws.c:421 ws_raw_read() ws_raw_read(0x56250b26c520): returning 1 wsh->x: 1
2024-04-26 10:45:37.402696 ws.c:1154 ws_read_frame() ws_read_frame(0x56250b25fc30): returning 4
2024-04-26 10:45:37.402713 tport.c:3246 tport_recv_iovec() tport_recv_iovec(0x56250b26c330) msg 0x56250b2aa110 from (wss/1.2.3.4:53279) has 4 bytes, veclen = 1
2024-04-26 10:45:37.402774 tport.c:3064 tport_deliver() tport_deliver(0x56250b26c330): bad msg 0x56250b2aa110 (4 bytes) from wss/1.2.3.4:53279 next=(nil)
2024-04-26 10:45:37.402786 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 34
2024-04-26 10:45:37.402797 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 33
2024-04-26 10:45:37.402809 tport.c:2324 tport_set_secondary_timer() tport(0x56250b26c330): reset timer
2024-04-26 10:45:38.067530 tport.c:2814 tport_wakeup() tport_wakeup(0x56250b26c330): events IN
2024-04-26 10:45:38.067770 tport.c:2905 tport_recv_event() tport_recv_event(0x56250b26c330)
2024-04-26 10:45:38.067812 ws.c:367 ws_raw_read() ws_raw_read(0x56250b26c520): SSL_read returned -1, blocking 0
2024-04-26 10:45:38.067830 ws.c:374 ws_raw_read() ws_raw_read(0x56250b26c520): not blocking and SSL_ERROR_WANT_READ, returning -2
2024-04-26 10:45:38.067839 ws.c:421 ws_raw_read() ws_raw_read(0x56250b26c520): returning -2 wsh->x: 1
2024-04-26 10:45:38.067852 tport.c:2324 tport_set_secondary_timer() tport(0x56250b26c330): reset timer
2024-04-26 10:45:38.068545 tport.c:2814 tport_wakeup() tport_wakeup(0x56250b26c330): events IN
2024-04-26 10:45:38.068580 tport.c:2905 tport_recv_event() tport_recv_event(0x56250b26c330)
2024-04-26 10:45:38.068624 ws.c:367 ws_raw_read() ws_raw_read(0x56250b26c520): SSL_read returned 9, blocking 0
2024-04-26 10:45:38.068638 ws.c:421 ws_raw_read() ws_raw_read(0x56250b26c520): returning 9 wsh->x: 2
2024-04-26 10:45:38.068651 ws.c:367 ws_raw_read() ws_raw_read(0x56250b26c520): SSL_read returned 5174, blocking 1
2024-04-26 10:45:38.068660 ws.c:421 ws_raw_read() ws_raw_read(0x56250b26c520): returning 5174 wsh->x: 1
2024-04-26 10:45:38.068692 ws.c:1154 ws_read_frame() ws_read_frame(0x56250b25fc30): returning 5175
2024-04-26 10:45:38.068732 tport.c:3246 tport_recv_iovec() tport_recv_iovec(0x56250b26c330) msg 0x56250b2aa110 from (wss/1.2.3.4:53279) has 5175 bytes, veclen = 1
2024-04-26 10:45:38.068979 recv 5175 bytes from wss/[1.2.3.4]:53279 at 10:45:38.068745:
SIP/2.0 200 OK
Via: SIP/2.0/WSS 9.9.9.9:3447;branch=z9hG4bKF0548QF9aXNFg
From: <sip:tqd@3.4.5.6>;tag=FZ49vQDZt34vB
To: "user.1" <sip:user.1@2.3.4.5>;tag=8tmkdjgq5e
CSeq: 32 INVITE
Call-ID: 8rg6rucld80evfp4prhl
Supported: outbound
Contact: <sip:user.1@rrjq9f4mmqcr.invalid;transport=wss;ob>
Content-Type: application/sdp
Content-Length: 4744

v=0
o=- 9173172651639938085 3 IN IP4 127.0.0.1
s=-
t=0 0
m=audio 49292 UDP/TLS/RTP/SAVPF 9 0 8 126
......
m=video 57821 UDP/TLS/RTP/SAVPF 102 104 106 108 127 39 112
......
2024-04-26 10:45:38.069017 tport.c:3064 tport_deliver() tport_deliver(0x56250b26c330): msg 0x56250b2aa110 (5175 bytes) from wss/1.2.3.4:53279 next=(nil)
2024-04-26 10:45:38.069031 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 34
2024-04-26 10:45:38.069044 nta.c:3469 agent_recv_response() nta: received 200 OK for INVITE (32)
2024-04-26 10:45:38.069057 nta.c:3566 agent_recv_response() nta: 200 OK to message callback
2024-04-26 10:45:38.069069 processMessageStatelessly - incoming message with call-id 8rg6rucld80evfp4prhl does not match an existing call leg, processed in thread 140401780577600
2024-04-26 10:45:38.069082 tport.c:1157 tport_ref() tport_ref(0x56250b26c330): refcount is now 35
2024-04-26 10:45:38.069093 tport.c:1170 tport_unref() tport_unref(0x56250b26c330): refcount is now 34
2024-04-26 10:45:38.069104 SipProxyController::processResponse 200 8rg6rucld80evfp4prhl
2024-04-26 10:45:38.069120 processMessageStatelessly - unknown response (possibly late arriving?) - discarding

... Retransmissions of the 200 OK keeps coming from the client ...

I modified processRequestInsideDialog in sip-dialog-controller.cpp to only call clearRIP when sip->sip_status->st_status is not 1xx just to try it out and that seems to fix the problem. However, I'm not familiar with the code base so I'm not sure if that's the proper place to make this kind of change. Any potential fix might belong somewhere else since sending a 100 Trying doesn't break anything.

davehorton commented 3 months ago

what version are you running? This is quite strange, as this scenario is quite common and no bug has ever been reported. Please test and provide results of your tests when running latest version which is 0.8.25

danvirsen commented 3 months ago

Hi! Thanks for the quick response! For this test I was running 0.8.25.

davehorton commented 3 months ago

In that case, please submit the full log including the INVITE, preferably at debug level

danvirsen commented 3 months ago

Here you go: drachtio-183-on-reinvite.log

Drachtio log was set to debug.

danvirsen commented 2 months ago

Hi again! Have you had a chance to take a look at this? Do you need anything more from me?