OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.28k stars 583 forks source link

[BUG] Callcenter dropping calls and agent locked forever in "incall" state #3123

Closed kosniaz closed 1 year ago

kosniaz commented 1 year ago

TL;DR In Opensips 3.1.16, we have implemented a call center that works well after the first few thousand calls. Then calls start disappearing and agents get locked in state "incall". We suspect it has something to do with some hash overflow. We would have liked to at least be able to release the locked agents. Only way to fix this is to restart every 24 hours.

OpenSIPS version you are running

version: opensips 3.1.16 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: b75cefb25
main.c compiled on  with gcc 8

Describe the bug

We have set up Opensips to receive calls on a specific number bridge them to one of our 100 SIP agents. We use the call center module, and it works well if traffic is relatively low (about 25 incoming calls per minute). However, after enough number of calls, especially when traffic is higher, i.e. up to 60 incoming calls per minute, we see calls getting rejected because of cc_handle_call() failing with error message:

DBG:b2b_entities:server_new: It is a retransmission, drop
ERROR:b2b_logic:b2b_process_scenario_init: failed to create new b2b server instance

Unfortunately, every time this happens, an agent's status gets stuck to "incall" forever, even though no cc_calls row includes him. So that agent is lost. Gradually, we lose a lot of agents and the call center is not responsive anymore.

We are running in UDP mode, using 6 UDP workers. I’m attaching the configuration file.

To Reproduce

Setup the opensips DB with opensips-cli. Create tables cc_flows and cc_agents tables and populate them. Start the opensips server and wait for calls to emerge. After about 10000 SIP calls and in high traffic periods (~60 incoming per minute) the error starts emerging.

Expected behavior

Incoming calls should be bridged with free agents. In case of error in the call center or b2b logic modules, the selected agent should be released.

Relevant System Logs

Jun 29 12:07:18 [16] DBG:call_center:w_handle_call: new destination for call(0x7f9b0ec09e68) is sip:prod-agent46@10.32.97.104 (state=6)
Jun 29 12:07:18 [16] DBG:call_center:set_call_leg: call 0x7f9b0ec09e68 moving to sip:prod-agent46@10.32.97.104 , state 6
Jun 29 12:07:18 [16] DBG:b2b_logic:get_scenario_id_list: scenario id = call center
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=ffffffffffffffff
Jun 29 12:07:18 [16] DBG:b2b_logic:udh_to_uri: user:host:port [+39011][10.32.97.104][5060]
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=ffffffffffffffff
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_insert_new: Fewer parameters, expected [2] received [1]
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_insert_new: hash index [667]:
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_insert_new: id [0]Jun 29 12:07:18 [12] DBG:core:_parse_to: end of header reached, state=10
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_insert_new: id [1]Jun 29 12:07:18 [12] DBG:core:_parse_to: display={}, ruri={sip:prod-agent55@10.32.97.104}
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_insert_new: new tuple [0x7f9b0eba2f88]->[667.1]
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=ffffffffffffffff
Jun 29 12:07:18 [16] DBG:tm:t_newtran: transaction on entrance=0xffffffffffffffff
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=ffffffffffffffff
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=78
Jun 29 12:07:18 [16] DBG:tm:t_lookup_request: start searching: hash=29252, isACK=0
Jun 29 12:07:18 [16] DBG:tm:matching_3261: RFC3261 transaction matched, tid=3345256582SU5JTi9pM3NpcC4t_4121592427_
Jun 29 12:07:18 [16] DBG:tm:t_lookup_request: REF_UNSAFE:[0x7f9b0ebf3a58] after is 3
Jun 29 12:07:18 [16] DBG:tm:t_lookup_request: transaction found (T=0x7f9b0ebf3a58)
Jun 29 12:07:18 [16] DBG:tm:t_retransmit_reply: buf=0x7f9b0e7ba880: SIP/2.0 1..., shmem=0x7f9b0ec164b8: SIP/2.0 1
Jun 29 12:07:18 [16] DBG:tm:t_unref: UNREF_UNSAFE: [0x7f9b0ebf3a58] after is 2
Jun 29 12:07:18 [16] DBG:b2b_entities:server_new: It is a retransmission, drop
Jun 29 12:07:18 [16] ERROR:b2b_logic:b2b_process_scenario_init: failed to create new b2b server instance
Jun 29 12:07:18 [16] DBG:b2b_logic:b2bl_delete: Delete record [0x7f9b0eba2f88]->[667.1], hash_index=[667], local_index=[1]
Jun 29 12:07:18 [16] ERROR:call_center:set_call_leg: failed to init new b2bua call (empty ID received)
Jun 29 12:07:18 [16] ERROR:call_center:w_handle_call: failed to set new destination for call
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [667.0] - 0x7f9b0ebaf7e0
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [811.0] - 0x7f9b0eb85ee8
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [793.0] - 0x7f9b0ec0c838
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [740.1] - 0x7f9b0ec05720
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [247.0] - 0x7f9b0ec67f10
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [707.0] - 0x7f9b0ec728c8
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [10.0] - 0x7f9b0eba5910
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [748.0] - 0x7f9b0eb900f8
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [740.0] - 0x7f9b0ebc2ac8
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [166.0] - 0x7f9b0eb89a80
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [155.0] - 0x7f9b0ebdc800
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [467.0] - 0x7f9b0eb922b0
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [702.0] - 0x7f9b0ec016c0
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [156.0] - 0x7f9b0eb87d58
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [712.0] - 0x7f9b0ec1d550
Jun 29 12:07:18 [16] DBG:call_center:print_call_list: [322.0] - 0x7f9b0ec70630
Jun 29 12:07:18 [16] DBG:call_center:free_cc_call: free call 0x7f9b0ec09e68, []
Jun 29 12:07:18 [16] DBG:core:MD5StringArray: MD5 calculated: e993738eba07178750e08eb6d39aa445
Jun 29 12:07:18 [16] DBG:core:parse_headers: flags=ffffffffffffffff
Jun 29 12:07:18 [16] DBG:core:destroy_avp_list: destroying list (nil)
Jun 29 12:07:18 [16] DBG:core:receive_msg: cleaning up

OS/environment information

Additional context

opensips.cfg file

#
# OpenSIPS residential configuration script
#     by OpenSIPS Solutions <team@opensips-solutions.com>
#
# This script was generated via "make menuconfig", from
#   the "Residential" scenario.
# You can enable / disable more features / functionalities by
#   re-generating the scenario with different options.#
#
# Please refer to the Core CookBook at:
#      https://opensips.org/Resources/DocsCookbooks
# for a explanation of possible statements, functions and parameters.
#

####### Global Parameters #########

/* uncomment the following lines to enable debugging */
debug_mode=yes

log_level=3
xlog_level=3
log_stderror=no
log_facility=LOG_LOCAL0

udp_workers=6

/* uncomment the next line to enable the auto temporary blacklisting of 
   not available destinations (default disabled) */
#disable_dns_blacklist=no

/* uncomment the next line to enable IPv6 lookup after IPv4 dns 
   lookup failures (default disabled) */
#dns_try_ipv6=yes

socket=udp:<our_ip>:5060  # CUSTOMIZE ME
alias=udp:<our_ip>:5060

####### Modules Section ########

#set module path
mpath="/usr/lib/x86_64-linux-gnu/opensips/modules/"

#### SIGNALING module
loadmodule "signaling.so"

#### StateLess module
loadmodule "sl.so"

#### Transaction Module
loadmodule "tm.so"
modparam("tm", "fr_timeout", 5)
modparam("tm", "fr_inv_timeout", 30)
modparam("tm", "restart_fr_on_each_reply", 0)
modparam("tm", "onreply_avp_mode", 1)

#### Record Route Module
loadmodule "rr.so"
/* do not append from tag to the RR (no need for this script) */
modparam("rr", "append_fromtag", 0)

#### MAX ForWarD module
loadmodule "maxfwd.so"

#### SIP MSG OPerationS module
loadmodule "sipmsgops.so"

#### FIFO Management Interface
loadmodule "mi_fifo.so"
modparam("mi_fifo", "fifo_name", "/tmp/opensips_fifo")
modparam("mi_fifo", "fifo_mode", 0666)

#### USeR LOCation module
loadmodule "usrloc.so"
modparam("usrloc", "nat_bflag", "NAT")
modparam("usrloc", "working_mode_preset", "single-instance-no-db")

#### REGISTRAR module
loadmodule "registrar.so"
modparam("registrar", "tcp_persistent_flag", "TCP_PERSISTENT")
/* uncomment the next line not to allow more than 10 contacts per AOR */
#modparam("registrar", "max_contacts", 10)

#### ACCounting module
loadmodule "acc.so"
/* what special events should be accounted ? */
modparam("acc", "early_media", 0)
modparam("acc", "report_cancels", 0)
/* by default we do not adjust the direct of the sequential requests.
   if you enable this parameter, be sure to enable "append_fromtag"
   in "rr" module */
modparam("acc", "detect_direction", 0)

loadmodule "proto_udp.so"

#### HTTPD module
loadmodule "httpd.so"
modparam("httpd", "ip", "0.0.0.0")

#### MI HTTP module
loadmodule "mi_http.so"

#### database connection module

loadmodule "db_mysql.so"

loadmodule "auth.so"
loadmodule "auth_db.so"

modparam("usrloc", "db_url", "mysql://opensips:pass@0.0.0.0/opensips")
modparam("auth_db", "db_url", "mysql://opensips:pass@0.0.0.0/opensips")
modparam("auth_db", "use_domain", 1)
modparam("auth_db", "calculate_ha1", 1)
modparam("auth_db", "password_column", "password")
modparam("auth_db", "load_credentials", "")

#### call center module
loadmodule "b2b_entities.so"
loadmodule "b2b_logic.so"
loadmodule "call_center.so"
modparam("call_center", "db_url","mysql://opensips:pass@0.0.0.0/opensips") 
modparam("call_center", "acc_db_url","mysql://opensips:pass@0.0.0.0/opensips")
modparam("b2b_logic","script_scenario", "/etc/opensips/scenario_callcenter.xml")
#modparam("b2b_logic","script_scenario", "/etc/opensips/scenario_refer.xml")
modparam("call_center", "b2b_scenario", "call center")
modparam("call_center", "wrapup_time", 0)
####### Routing Logic ########

# main request routing logic
#
#
#  login one agent

route{

    # requests for my domain

    if (is_method("REGISTER")) {
           # authenticate the REGISTER requests
           if (!www_authorize("", "subscriber")) {
            www_challenge("", "auth");
            exit;
           }

           if (!save("location"))
            sl_reply_error();

           exit;
    }

    if (is_method("INVITE") and !has_totag() and ($(tU) == "+39011") ) {
        if (!cc_handle_call("phonebanking")) {
            send_reply(403,"Cannot handle call");
            exit;
        }
    } else {

        # when routing via usrloc, log the missed calls also
        if (!mf_process_maxfwd_header(10)) {
            send_reply(483,"Too Many Hops");
            exit;
        }

        if (has_totag()) {

            # handle hop-by-hop ACK (no routing required)
            if ( is_method("ACK") && t_check_trans() ) {
                t_relay();
                exit;
            }

            # sequential request within a dialog should
            # take the path determined by record-routing
            if ( !loose_route() ) {
                # we doerecord-routing for all our traffic, so we should not
                # receive any sequential requests without Route hdr.
                send_reply(404,"Not here");
                exit;
            }

            if (is_method("BYE")) {
                # do accounting even if the transaction fails
                do_accounting("log","failed");
            }

            # route it out to whatever destination was set by loose_route()
            # in $du (destination URI).
            route(relay);
            exit;
        }

        # CANCEL processing
        if (is_method("CANCEL")) {
            if (t_check_trans())
                t_relay();
            exit;
        }

        # absorb retransmissions, but do not create transaction
        t_check_trans();

        if ( !(is_method("REGISTER")  ) ) {

            if (is_myself("$fd")) {

            } else {
                # if caller is not local, then called number must be local

                if (!is_myself("$rd")) {
                    send_reply(403,"Relay Forbidden");
                    exit;
                }
            }

        }

        # preloaded route checking
        if (loose_route()) {
            xlog("L_ERR",
                "Attempt to route with preloaded Route's [$fu/$tu/$ru/$ci]");
            if (!is_method("ACK"))
                send_reply(403,"Preload Route denied");
            exit;
        }

        #record routing
        if (!is_method("REGISTER|MESSAGE"))
            record_route();

        # account only INVITEs
        if (is_method("INVITE")) {

            do_accounting("log");
        }

        if (!is_myself("$rd")) {
            append_hf("P-hint: outbound\r\n"); 

            route(relay);
        }

        # requests for my domain

        if (is_method("PUBLISH|SUBSCRIBE")) {
            send_reply(503, "Service Unavailable");
            exit;
        }

        if (is_method("REGISTER")) {
            # store the registration and generate a SIP reply
            if (!save("location"))
                xlog("failed to register AoR $tu\n");

            exit;
        }

        if ($rU==NULL) {
            # request with no Username in RURI
            send_reply(484,"Address Incomplete");
            exit;
        }

        # do lookup with method filtering
        if (!lookup("location","m")) {
            t_reply(404, "Not Found");
            exit;
        }

        do_accounting("log","missed");
        route(relay);
    }

}

route[relay] {
    # for INVITEs enable some additional helper routes
    if (is_method("INVITE")) {
        t_on_branch("per_branch_ops");
        t_on_reply("handle_nat");
        t_on_failure("missed_call");
    }

    if (!t_relay()) {
        send_reply(500,"Internal Error");
    }
    exit;
}

branch_route[per_branch_ops] {
    xlog("new branch at $ru\n");
}

onreply_route[handle_nat] {
    xlog("incoming reply\n");
}

failure_route[missed_call] {
    if (t_was_cancelled()) {
        exit;
    }

    # uncomment the following lines if you want to block client 
    # redirect based on 3xx replies.
    ##if (t_check_status("3[0-9][0-9]")) {
    ##t_reply(404,"Not found");
    ##  exit;
    ##}

}

Dockerfile

FROM debian:buster

USER root

# Set Environment Variables
ENV DEBIAN_FRONTEND noninteractive

ARG OPENSIPS_VERSION=3.1
ARG OPENSIPS_BUILD=releases

#install basic components
RUN apt-get -y update -qq && apt-get -y install gnupg2 ca-certificates

#add keyserver, repository
RUN apt-key adv --fetch-keys https://apt.opensips.org/pubkey.gpg
RUN echo "deb https://apt.opensips.org buster ${OPENSIPS_VERSION}-${OPENSIPS_BUILD}" >/etc/apt/sources.list.d/opensips.list

RUN apt-get -y update -qq && apt-get -y install opensips

ARG OPENSIPS_CLI=false
RUN if [ ${OPENSIPS_CLI} = true ]; then \
    echo "deb https://apt.opensips.org buster cli-nightly" >/etc/apt/sources.list.d/opensips-cli.list \
    && apt-get -y update -qq && apt-get -y install opensips-cli \
    ;fi

ARG OPENSIPS_EXTRA_MODULES
RUN if [ -n "${OPENSIPS_EXTRA_MODULES}" ]; then \
    apt-get -y install ${OPENSIPS_EXTRA_MODULES} \
    ;fi

RUN rm -rf /var/lib/apt/lists/*
RUN sed -i "s/^\(socket\|listen\)=udp.*5060/\1=udp:eth0:5060/g" /etc/opensips/opensips.cfg

EXPOSE 5060/udp

ENTRYPOINT ["/usr/sbin/opensips", "-FE"]

Makefile

NAME ?= opensips
OPENSIPS_VERSION ?= 3.1
OPENSIPS_BUILD ?= releases
OPENSIPS_DOCKER_TAG ?= kubaras/ilsp-opensips:3.2.13
OPENSIPS_CLI ?= true
OPENSIPS_EXTRA_MODULES ?= opensips-mysql-module default-mysql-client vim opensips-http-modules opensips-b2bua-module

all: build start

.PHONY: build start
build:
    docker build \
        --build-arg=OPENSIPS_BUILD=$(OPENSIPS_BUILD) \
        --build-arg=OPENSIPS_VERSION=$(OPENSIPS_VERSION) \
        --build-arg=OPENSIPS_CLI=${OPENSIPS_CLI} \
        --build-arg=OPENSIPS_EXTRA_MODULES="${OPENSIPS_EXTRA_MODULES}" \
        --tag="$(OPENSIPS_DOCKER_TAG)" \
        .

start:
    docker run -d --name $(NAME) opensips/opensips:$(OPENSIPS_DOCKER_TAG)
bogdan-iancu commented 1 year ago

This may be a scripting issue combined with a poor error handling in the call_center module. In script, your CC trigger has no incoming retransmission protection. Try placing a t_newtran() before invoking CC.

kosniaz commented 1 year ago

Thank you for the answer! I was about to post that we managed to solved our issue, exactly as you mentioned!

Turns out that when the proxy was dealing with heavy load, the invites from genesys were re-sent, as opensips didn't send 100 Trying until 1-2 seconds had passed. So we fixed it by creating the transaction before running cc_handle_call, which made the next invite to be dropped. We had already found that we had to use t_check_trans() but most time the transaction hadn't been created because it was no use.

kosniaz commented 1 year ago

MicrosoftTeams-image (1)

Before the fix

kosniaz commented 1 year ago

After the fix :)

MicrosoftTeams-image (2)

prathibhacdac commented 7 months ago

Can you share the corrected oepnsips.cfg file?