drachtio / drachtio-server

A SIP call processing server that can be controlled via nodejs applications
https://drachtio.org
MIT License
233 stars 90 forks source link

Garbage response when handling a 408 Request TImeout from the network #315

Open davehorton opened 8 months ago

davehorton commented 8 months ago

The scenario is this:

The interesting thing is that this can then lead to drachtio-srf having an error with the subsequent BYE on the same dialog, complaining that the message does not start with a length specifier.

Example logs:

2023-11-08 12:55:54.712956 send 843 bytes to tls/[192.150.190.58]:50013 at 12:55:54.712879:
REFER sip:+496022122334@192.150.190.58:5061;transport=tls;did=b58.2e810b86 SIP/2.0
Via: SIP/2.0/TLS 3.73.70.70;branch=z9hG4bKeZ165Zv0X7Stm
Max-Forwards: 70
From: ""International"" <sip:+991202345@sip-vg.company.ai>;tag=KtQ53B9aQaB3S
To: ""Obernburg Am Main, Germany"" <sip:+496022122334@uc.abc.com>;tag=1c1354699239
Call-ID: 3698038028112023135353@192.150.190.58
CSeq: 3 REFER
Refer-To: <sip:+492212077271025@uc.abc.com:5081>
Referred-By: <sip:+496022122334@3.73.70.70>
Content-Length: 0
X-Reason: Bot initiated a transfer.
User-To-User: 0563743a5b6f626a656374204f626a6563745d3b65623a747275653b62706e3a3136373939313236373b6e623a3b6e743a3b63723a3b73693a33643362376566372d626561622d346538612d626361342d3066663366343030376162343b%3Bencoding%3Dhex%3Bpurpose%3Disdn-uui%3Bcontent%3Disdn-uui
Contact: <sip:3.73.70.70:5061;transport=tls>

2023-11-08 12:56:04.352294 recv 360 bytes from tls/[192.150.190.58]:50013 at 12:56:04.352218:
SIP/2.0 408 Request Timeout
Via: SIP/2.0/TLS 3.73.70.70;received=3.73.70.70;rport=5061;branch=z9hG4bKeZ165Zv0X7Stm
From: ""International"" <sip:+991202345@sip-vg.company.ai>;tag=KtQ53B9aQaB3S
To: ""Obernburg Am Main, Germany"" <sip:+496022122334@uc.abc.com>;tag=1c1354699239
Call-ID: 3698038028112023135353@192.150.190.58
CSeq: 3 REFER
Content-Length: 0

2023-11-08 12:56:04.352329 nta.c:3484 agent_recv_response() nta: received 408 Request Timeout for REFER (3)
2023-11-08 12:56:04.353071 nta.c:6985 incoming_reply() nta: not sending 408 Request Timeout for REFER (75170226)

12:56:05.718292 recv 795 bytes from udp/[10.1.130.94]:5060 at 12:56:05.718186:
REFER sip:3.73.70.70:5060 SIP/2.0
Via: SIP/2.0/UDP 10.1.130.94;rport;branch=z9hG4bK2yj6UyXN2yNFD
Max-Forwards: 70
From: ""International"" <sip:+991202345@sip-vg.company.ai>;tag=KyceFZHmj9S1m
To: ""Obernburg Am Main, Germany"" <sip:+496022122334@3.73.70.70:5060>;tag=jHyc2gr7S1mge
Call-ID: b7bda645-f8d8-123c-3381-02cf2371e90b
CSeq: 75170226 REFER
Contact: <sip:10.1.130.94:5060>
Refer-To: <sip:+492212077271025@3.73.70.70>
Referred-By: <sip:+496022122334@3.73.70.70>
Content-Length: 0
X-Reason: Bot initiated a transfer.
User-To-User: 0563743a5b6f626a656374204f626a6563745d3b65623a747275653b62706e3a3136373939313236373b6e623a3b6e743a3b63723a3b73693a33643362376566372d626561622d346538612d626361342d3066663366343030376162343b%3Bencoding%3Dhex%3Bpurpose%3Disdn-uui%3Bcontent%3Disdn-uui

2023-11-08 12:56:05.718402 send 340 bytes to udp/[10.1.130.94]:5060 at 12:56:05.718343:
SIP/2.0 408 �U
Via: SIP/2.0/UDP 10.1.130.94;rport=5060;branch=z9hG4bK2yj6UyXN2yNFD
From: ""International"" <sip:+991202345@sip-vg.company.ai>;tag=KyceFZHmj9S1m
To: ""Obernburg Am Main, Germany"" <sip:+496022122334@3.73.70.70:5060>;tag=jHyc2gr7S1mge
Call-ID: b7bda645-f8d8-123c-3381-02cf2371e90b
CSeq: 75170226 REFER
Content-Length: 0

the relevant code is in nta.c of sofia

/** Send the response message.
 *
 * @note The ownership of msg is handled to incoming_reply().
 */
int incoming_reply(nta_incoming_t *irq, msg_t *msg, sip_t *sip)
{
  nta_agent_t *agent = irq->irq_agent;
  int status = sip->sip_status->st_status;
  int sending = 1;
  int *use_rport = NULL;
  int retry_without_rport = 0;
  tp_name_t *tpn, default_tpn[1];

  if (status == 408 &&
      irq->irq_method != sip_method_invite &&
      !agent->sa_pass_408 &&
      !irq->irq_default) {
    /* RFC 4320 nit-actions-03 Action 2:

   A transaction-stateful SIP element MUST NOT send a response with
   Status-Code of 408 to a non-INVITE request.  As a consequence, an
   element that can not respond before the transaction expires will not
   send a final response at all.
    */
    sending = 0;
  }

Basically, it intentionally does not send on the 408. However, the "phantom response" that is never sent, is retrieved when we get the next re-transmit of the REFER from B, and we then try to send it out but its garbage. It would seem that somehow in not sending the 408 some piece of logic that keeps the message around for later is missed, leading to the problem. I tried simply adding a msg_ref to the message above where we decide not to send but that did not work, so more investigation is needed.

For now, though, I have made this commit to enable sofia to pass us these 408 responses, which also solves the issue.