OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.23k stars 571 forks source link

[BUG] local_route seems not to be called for locally generated CANCEL requests in the case of parallel forking #3432

Open hizbi-github opened 2 weeks ago

hizbi-github commented 2 weeks ago

OpenSIPS version you are running version: opensips 3.4.6 (x86_64/linux) flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535 poll method support: poll, epoll, sigio_rt, select. git revision: cf708e1b2 main.c compiled on 18:32:54 Jul 2 2024 with gcc 12

Describe the bug We need to send push notification to devices which don't answer the call in the case of parallel forking, but it seems local_route is not being called for locally generated CANCEL requests in the case of parallel forking.

To Reproduce

  1. Make multiple registration on the same AOR to trigger parallel forking
  2. Define local_route with xlog in opensips configuration file.
  3. Make a call to AOR having multiple registrations and answer the call on any device.
  4. You will see opensips sending cancels to other devices but local route is not executed.

Expected behavior According to the OpenSIPS documentation, local_route should be executed automatically when a new SIP request is generated internally by the TM module but same behavior was not observed. See the below related debugs logs where Cancel seems to be fired but no local route was executed.

Relevant System Logs Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:run_any_trans_callbacks: trans=0x7fe32ff0c190, callback type 1024, id 1 entered Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_msg: SIP Request: Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_msg: method: <CANCEL> Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_msg: uri: <sip:1000@172.16.254.177:53766;ob> Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_msg: version: <SIP/2.0> Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_headers: flags=ffffffffffffffff Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_via_param: found param type 232, <branch> = <z9hG4bK85d2.a0edb19.1>; state=16 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_via: end of header reached, state=5 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_headers: via found, flags=ffffffffffffffff Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:parse_headers: this is the first via Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:_parse_to: end of header reached, state=10 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:_parse_to: display={}, ruri={sip:1000@172.16.254.177} Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:get_hdr_field: <To> [26]; uri=[sip:1000@172.16.254.177] Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: [60B blob data] Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:get_hdr_field: cseq <CSeq>: <1> <CANCEL> Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:get_hdr_field: content_length=0 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:get_hdr_field: found end of header Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:dialog:dlg_onreq_out: skipping method 2 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:cancel_branch: sending cancel... Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:set_timer: relative timeout is 500000 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:insert_timer_unsafe: [4]: 0x7fe32ff0c678 (468600000) Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:insert_timer_unsafe: [0]: 0x7fe32ff0c6a8 (473) Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:tm:t_unref: UNREF_UNSAFE: [0x7fe32ff0c190] after is 0 Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:destroy_avp_list: destroying list (nil) Jul 07 22:50:18 ip-172-31-77-179 /usr/sbin/opensips[29483]: DBG:core:receive_msg: cleaning up

OS/environment information

bogdan-iancu commented 2 weeks ago

@hizbi-github , indeed, by design, the locally generated CANCELs do not trigger local route. Now, I do understand your need to fire the PN when the CANCEL is to be sent to an end-point, due to parallel forking. As a work around here, I may have couple of suggestions, starting from the idea that instead of firing the PN on CANCEL sending, you can do it in different place which indicates that a CANCEL will be sent out: 1) if you do parallel forking, in onreply_route, upon receiving a 200 OK (which inevitable will translate in canceling the other branches), you can do the PN for the branches . You can use some AVPs (one per branch, using the branch index in the avp name) to remember the branches and their status. 2) in failure route, if you had a 408 timeout locally generated, this means all the pending branches were to be cancelled, so you can do the same here. I know, it may not be the nicest way, but at least this is something you can do now.

hizbi-github commented 2 weeks ago

Hi @bogdan-iancu, thanks for the suggestion. I will try that and report back if there are any issues. Thanks again!

hizbi-github commented 1 week ago

Hi @bogdan-iancu, sorry for the late response.

I am still having an issue. We are sending PN to devices in the request route by looking at the database records (number of devices per AOR), but branch_id is only available in the branch_route/reply route. This leaves me with no way to map them together, i.e., which PN is attached to which branch_id. For this to work, I need to have a way to attach branch_id to the device PN, so I can send a cancel to other branches by finding the PN through branch_id.

Any idea about on how to approach this? Thanks!