OpenSIPS / opensips

OpenSIPS is a GPL implementation of a multi-functionality SIP Server that targets to deliver a high-level technical solution (performance, security and quality) to be used in professional SIP server platforms.
https://opensips.org
Other
1.26k stars 577 forks source link

[BUG] clusterer segfault when DB entry for clusterer can't be resolved and node_id for the local node is higher than the remote which can't be resolved #3473

Open tommybrecher opened 1 day ago

tommybrecher commented 1 day ago

OpenSIPS version you are running

Describe the bug

When starting opensips, if a DNS entry exists in the url field of the clusterer DB table which can't be resolved (missing DNS entry), opensips will segfault in sync.c:97 (queue_sync_request). This happens because cluster->current_node is NULL, resulting in a segmentation fault when trying to access ->flags

┌──sync.c──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│   83                                  return node->node_id;                                                                                                                                                                                                  │
│   84                          }                                                                                                                                                                                                                              │
│   85                                                                                                                                                                                                                                                         │
│   86                          lock_release(node->lock);                                                                                                                                                                                                      │
│   87                  }                                                                                                                                                                                                                                      │
│   88                                                                                                                                                                                                                                                         │
│   89                  return 0;                                                                                                                                                                                                                              │
│   90          }                                                                                                                                                                                                                                              │
│   91                                                                                                                                                                                                                                                         │
│   92          int queue_sync_request(cluster_info_t *cluster, struct local_cap *lcap)                                                                                                                                                                        │
│   93          {                                                                                                                                                                                                                                              │
│   94                  lock_get(cluster->lock);                                                                                                                                                                                                               │
│   95                  lcap->flags |= CAP_SYNC_PENDING;                                                                                                                                                                                                       │
│   96                                                                                                                                                                                                                                                         │
│  >97                  if (cluster->current_node->flags & NODE_IS_SEED)                                                                                                                                                                                       │
│   98                          gettimeofday(&lcap->sync_req_time, NULL);                                                                                                                                                                                      │
│   99                                                                                                                                                                                                                                                         │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
multi-thre Thread 0x7fc5c0279b In: queue_sync_request                                                                                                                                                                                  L97   PC: 0x7fc5bd0f18e4 
(gdb) p *cluster
$1 = {
  cluster_id = 1,
  no_nodes = 0,
  node_list = 0x0,
  current_node = 0x0,
  send_sock = 0x0,
  lock = 0x7fc5bda518a8,
  top_version = 0,
  capabilities = 0x7fc5bda560f0,
  next = 0x0
}

To Reproduce

opensips.cfg

####### Global Parameters #########
log_level=4
xlog_level=4
stderror_enabled=no
syslog_enabled=yes
syslog_facility=LOG_LOCAL0
udp_workers=4
socket=udp:10.192.200.39:5060
socket=bin:10.192.200.39:5566

####### Modules Section ########

#set module path
mpath="/usr/local/opensips-upstream/lib64/opensips/modules/"
loadmodule "signaling.so"
loadmodule "sl.so"
loadmodule "tm.so"
loadmodule "rr.so"
loadmodule "maxfwd.so"
loadmodule "sipmsgops.so"
loadmodule "mi_fifo.so"
loadmodule "proto_udp.so"
loadmodule "proto_bin.so"
loadmodule "clusterer.so"
loadmodule "dialog.so"
loadmodule "db_mysql.so"

# configure modules
# TM
modparam("tm", "fr_timeout", 5)
modparam("tm", "fr_inv_timeout", 30)
modparam("tm", "restart_fr_on_each_reply", 0)
modparam("tm", "onreply_avp_mode", 1)
# RR
modparam("rr", "append_fromtag", 0)
# MI_FIFO
modparam("mi_fifo", "fifo_name", "/run/opensips/opensips_fifo")
modparam("mi_fifo", "fifo_mode", 0666)
# CLUSTERER
modparam("clusterer", "my_node_id", 200)
modparam("clusterer", "seed_fallback_interval", 300)
modparam("clusterer", "db_url", "mysql://******:******@localhost/opensips_upstream")
# DIALOG
modparam("dialog", "db_mode", 3)
modparam("dialog", "db_url", "mysql://******:******@localhost/opensips_upstream")
modparam("dialog", "profiles_with_value", "trunkCalls;")
modparam("dialog", "dialog_replication_cluster", 1)

####### Routing Logic ########
route{
        if (!mf_process_maxfwd_header(10)) {
                send_reply(483,"Too Many Hops");
                exit;
        }
        if (has_totag()) {
                if ( is_method("ACK") && t_check_trans() ) {
                        t_relay();
                        exit;
                }
                if ( !loose_route() ) {
                        send_reply(404,"Not here");
                        exit;
                }
                route(relay);
                exit;
        }
        if (is_method("CANCEL")) {
                if (t_check_trans())
                        t_relay();
                exit;
        }
        t_check_trans();
        if ( !(is_method("REGISTER")  ) ) {
                if (is_myself("$fd")) {
                } else {
                        if (!is_myself("$rd")) {
                                send_reply(403,"Relay Forbidden");
                                exit;
                        }
                }
        }
        if (loose_route()) {
                xlog("L_ERR", "Attempt to route with preloaded Route's [$fu/$tu/$ru/$ci]");
                if (!is_method("ACK"))
                        send_reply(403,"Preload Route denied");
                exit;
        }
        if (!is_method("REGISTER|MESSAGE"))
                record_route();
        if (!is_myself("$rd")) {
                append_hf("P-hint: outbound\r\n"); 
                route(relay);
        }
        if (is_method("PUBLISH|SUBSCRIBE")) {
                send_reply(503, "Service Unavailable");
                exit;
        }
        if ($rU==NULL) {
                # request with no Username in RURI
                send_reply(484,"Address Incomplete");
                exit;
        }
        route(relay);
}

route[relay] {
        # for INVITEs enable some additional helper routes
        if (is_method("INVITE")) {
                t_on_branch("per_branch_ops");
                t_on_reply("handle_nat");
                t_on_failure("missed_call");
        }

        if (!t_relay()) {
                send_reply(500,"Internal Error");
        }
        exit;
}
branch_route[per_branch_ops] {
        xlog("new branch at $ru\n");
}
onreply_route[handle_nat] {
        xlog("incoming reply\n");
}
failure_route[missed_call] {
        if (t_was_cancelled()) {
                exit;
        }
}

Clusterer table

INSERT INTO clusterer (id, cluster_id, node_id, url, state, no_ping_retries, priority, sip_addr, flags, description) VALUES
(1, 1, 200, 'bin:10.192.200.39:5566', 1, 3, 50, '10.192.200.39', 'seed', 'local_node'),
(2, 1, 20, 'bin:XXXXX8-XXX2g.XX.XXX.XXXX.XXXXXXXXXXXXXX.XXX:5566', 1, 3, 50, '10.192.200.39', 'seed', 'neighbor');

Expected behavior

Relevant System Logs

Sep 18 16:28:45 [236214] DBG:core:daemonize: waiting for status code from children
Sep 18 16:28:45 [236214] DBG:core:count_module_procs: modules require 1 extra processes
Sep 18 16:28:45 [236214] DBG:core:count_init_child_processes: 16 children are going to be inited
Sep 18 16:28:45 [236217] DBG:core:install_sigs: override SIGSEGV handler: success
Sep 18 16:28:45 [236217] DBG:core:set_core_dump: core dump limits set to 18446744073709551615
Sep 18 16:28:45 [236217] NOTICE:core:main: version: opensips 3.6.0-dev (x86_64/linux)
Sep 18 16:28:45 [236217] NOTICE:core:main: using 32 MB of shared memory, allocator: Q_MALLOC_DBG
Sep 18 16:28:45 [236217] NOTICE:core:main: using 16 MB of private process memory, allocator: Q_MALLOC_DBG
Sep 18 16:28:45 [236217] INFO:core:init_reactor_size: reactor size 1024 (using up to 0.04Mb of memory per process)
Sep 18 16:28:45 [236217] DBG:core:ipc_register_handler: IPC type 0 [RPC] registered with handler (nil)
Sep 18 16:28:45 [236217] DBG:core:__search_avp_map: looking for [serial_branch] avp  - found -1
Sep 18 16:28:45 [236217] DBG:core:new_avp_alias: added alias serial_branch with id 1
Sep 18 16:28:45 [236217] DBG:core:find_mod_export: <trace_bind_api> in module proto_hep not found
Sep 18 16:28:45 [236217] DBG:core:trace_prot_bind: <proto_hep> has no bind api function
Sep 18 16:28:45 [236217] DBG:core:try_load_trace_api: No tracing module used!
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CORE_THRESHOLD(1)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CORE_SHM_THRESHOLD(2)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CORE_PKG_THRESHOLD(3)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CORE_PROC_AUTO_SCALE(4)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CORE_TCP_DISCONNECT(5)>
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module tm
Sep 18 16:28:45 [236217] INFO:tm:mod_init: TM - initializing...
Sep 18 16:28:45 [236217] DBG:core:get_flag_id_by_name: Flag name is null!
Sep 18 16:28:45 [236217] DBG:tm:init_callid: Call-ID initialization: '36ece506127557a6'
Sep 18 16:28:45 [236217] DBG:tm:lock_initialize: lock initialization started
Sep 18 16:28:45 [236217] DBG:tm:tm_init_timers: creating 0 parallel timer structures
Sep 18 16:28:45 [236217] DBG:tm:mod_init: timer set shift is 3
Sep 18 16:28:45 [236217] DBG:core:MD5StringArray: MD5 calculated: d0d1c9455982febf81eb76e31906bf37
Sep 18 16:28:45 [236217] DBG:core:register_pv_context: Registered new context: request / 0x7fc5bd1fa3f0
Sep 18 16:28:45 [236217] DBG:core:register_pv_context: Context not found
Sep 18 16:28:45 [236217] DBG:core:register_pv_context: Registered new context: reply / 0x7fc5bd1fa261
Sep 18 16:28:45 [236217] DBG:core:register_pv_context: Context not found
Sep 18 16:28:45 [236217] DBG:core:add_pv_context: Registered new context: reply
Sep 18 16:28:45 [236217] DBG:core:__search_avp_map: looking for [uac_ctx] avp  - found -1
Sep 18 16:28:45 [236217] DBG:core:new_avp_alias: added alias uac_ctx with id 2
Sep 18 16:28:45 [236217] DBG:tm:tm_init_cluster: tm_replication_cluster not set - not engaging!
Sep 18 16:28:45 [236217] DBG:core:init_mod: registering stats for tm
Sep 18 16:28:45 [236217] DBG:core:init_mod: register MI for tm
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module sl
Sep 18 16:28:45 [236217] INFO:sl:mod_init: Initializing StateLess engine
Sep 18 16:28:45 [236217] DBG:core:init_mod: registering stats for sl
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module signaling
Sep 18 16:28:45 [236217] NOTICE:signaling:mod_init: initializing module ...
Sep 18 16:28:45 [236217] DBG:core:find_mod_cmd_export_t: found <load_tm> in module tm [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:find_mod_cmd_export_t: found <load_sl> in module sl [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module rr
Sep 18 16:28:45 [236217] INFO:rr:mod_init: rr - initializing
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module maxfwd
Sep 18 16:28:45 [236217] INFO:maxfwd:mod_init: initializing...
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module sipmsgops
Sep 18 16:28:45 [236217] INFO:sipmsgops:mod_init: initializing...
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module mi_fifo
Sep 18 16:28:45 [236217] DBG:mi_fifo:mi_mod_init: testing fifo existence ...
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module proto_udp
Sep 18 16:28:45 [236217] INFO:core:mod_init: initializing UDP-plain protocol
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module proto_bin
Sep 18 16:28:45 [236217] INFO:proto_bin:mod_init: initializing BIN protocol
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module clusterer
Sep 18 16:28:45 [236217] INFO:clusterer:mod_init: Clusterer module - initializing
Sep 18 16:28:45 [236217] DBG:core:find_mod_export: found <db_bind_api> in module db_mysql [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:db_bind_mod: using db bind api for db_mysql
Sep 18 16:28:45 [236217] DBG:core:db_do_init: connection 0x7fc5bf2c4a30 not found in pool
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_connect: opening connection: mysql://xxxx:xxxx@localhost/opensips_upstream
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_connect: connection type is Localhost via UNIX socket
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_connect: protocol version is 10
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_connect: server version is 5.5.5-10.3.39-MariaDB-0ubuntu0.20.04.2
Sep 18 16:28:45 [236217] DBG:core:db_do_init: connection 0x7fc5bf2c4a30 inserted in pool as 0x7fc5bf2c4cc0
Sep 18 16:28:45 [236217] DBG:db_mysql:mysql_raise_event: event not registered -1
Sep 18 16:28:45 [236217] DBG:core:db_new_result: allocate 48 bytes for result set at 0x7fc5bf2c5388
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: 1 columns returned from the query
Sep 18 16:28:45 [236217] DBG:core:db_allocate_columns: allocate 28 bytes for result columns at 0x7fc5bf2c5420
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5428)[0]=[table_version]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:core:db_allocate_rows: allocate 48 bytes for result rows and values at 0x7fc5bf2c54a8
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [4]
Sep 18 16:28:45 [236217] DBG:core:db_free_columns: freeing result columns at 0x7fc5bf2c5420
Sep 18 16:28:45 [236217] DBG:core:db_free_rows: freeing 1 rows
Sep 18 16:28:45 [236217] DBG:core:db_free_row: freeing row values at 0x7fc5bf2c54b8
Sep 18 16:28:45 [236217] DBG:core:db_free_rows: freeing rows at 0x7fc5bf2c54a8
Sep 18 16:28:45 [236217] DBG:core:db_free_result: freeing result set at 0x7fc5bf2c5388
Sep 18 16:28:45 [236217] DBG:clusterer:load_db_info: DB query - retrieve the list of clusters in which the local node runs
Sep 18 16:28:45 [236217] DBG:db_mysql:mysql_raise_event: event not registered -1
Sep 18 16:28:45 [236217] DBG:core:db_new_result: allocate 48 bytes for result set at 0x7fc5bf2c5388
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: 1 columns returned from the query
Sep 18 16:28:45 [236217] DBG:core:db_allocate_columns: allocate 28 bytes for result columns at 0x7fc5bf2c5420
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5428)[0]=[cluster_id]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:core:db_allocate_rows: allocate 48 bytes for result rows and values at 0x7fc5bf2c54a8
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:clusterer:load_db_info: 1 rows found in clusterer
Sep 18 16:28:45 [236217] DBG:core:db_free_columns: freeing result columns at 0x7fc5bf2c5420
Sep 18 16:28:45 [236217] DBG:core:db_free_rows: freeing 1 rows
Sep 18 16:28:45 [236217] DBG:core:db_free_row: freeing row values at 0x7fc5bf2c54b8
Sep 18 16:28:45 [236217] DBG:core:db_free_rows: freeing rows at 0x7fc5bf2c54a8
Sep 18 16:28:45 [236217] DBG:core:db_free_result: freeing result set at 0x7fc5bf2c5388
Sep 18 16:28:45 [236217] DBG:clusterer:load_db_info: DB query - retrieve nodes info
Sep 18 16:28:45 [236217] DBG:db_mysql:mysql_raise_event: event not registered -1
Sep 18 16:28:45 [236217] DBG:core:db_new_result: allocate 48 bytes for result set at 0x7fc5bf2c5388
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: 10 columns returned from the query
Sep 18 16:28:45 [236217] DBG:core:db_allocate_columns: allocate 280 bytes for result columns at 0x7fc5bf2c5420
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5470)[0]=[id]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5480)[1]=[cluster_id]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5490)[2]=[node_id]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54a0)[3]=[url]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54b0)[4]=[state]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54c0)[5]=[no_ping_retries]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54d0)[6]=[priority]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_INT result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54e0)[7]=[sip_addr]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c54f0)[8]=[flags]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: RES_NAMES(0x7fc5bf2c5500)[9]=[description]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_get_columns: use DB_STRING result type
Sep 18 16:28:45 [236217] DBG:core:db_allocate_rows: allocate 672 bytes for result rows and values at 0x7fc5bf2c6858
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [2]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [20]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [bin:XXXXX8-XXX2g.XX.XXX.XXXX.XXXXXXXXXXXXXX.XXX:5566]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [3]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [50]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [10.192.200.39]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [seed]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [neighbor]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [200]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [bin:10.192.200.39:5566]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [1]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [3]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting INT [50]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [10.192.200.39]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [seed]
Sep 18 16:28:45 [236217] DBG:db_mysql:db_mysql_str2val: converting STRING [local_node]
Sep 18 16:28:45 [236217] DBG:clusterer:load_db_info: 2 rows found in clusterer
Sep 18 16:28:45 [236217] DBG:clusterer:add_node_info: adding node: XXXXX8-XXX2g.XX.XXX.XXXX.XXXXXXXXXXXXXX.XXX:5566 (df: 5555)
Sep 18 16:28:45 [236217] DBG:core:sip_resolvehost: has port -> do A record lookup!
Sep 18 16:28:45 [236217] ERROR:clusterer:add_node_info: Cannot resolve host: XXXXX8-XXX2g.XX.XXX.XXXX.XXXXXXXXXXXXXX.XXX
Sep 18 16:28:45 [236217] ERROR:clusterer:load_db_info: Unable to add node info to backing list
Sep 18 16:28:45 [236217] DBG:core:pool_remove: removing connection from the pool
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CLUSTERER_REQ_RECEIVED(6)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CLUSTERER_RPL_RECEIVED(7)>
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CLUSTERER_NODE_STATE_CHANGED(8)>
Sep 18 16:28:45 [236217] DBG:core:_create_new_group: adding new group [clusterer]
Sep 18 16:28:45 [236217] DBG:core:sr_register_identifier: adding new identifier [node_states] to group [clusterer]
Sep 18 16:28:45 [236217] DBG:core:sr_register_identifier: adding new identifier [sharing_tags] to group [clusterer]
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_CLUSTERER_SHARING_TAG_CHANGED(9)>
Sep 18 16:28:45 [236217] DBG:core:init_mod: registering stats for clusterer
Sep 18 16:28:45 [236217] DBG:core:init_mod: register MI for clusterer
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module db_mysql
Sep 18 16:28:45 [236217] DBG:db_mysql:mysql_mod_init: mysql: MySQL client version is 8.0.39
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_MYSQL_CONNECTION(10)>
Sep 18 16:28:45 [236217] DBG:core:init_mod: initializing module dialog
Sep 18 16:28:45 [236217] INFO:dialog:mod_init: Dialog module - initializing
Sep 18 16:28:45 [236217] INFO:core:evi_publish_event: Registered event <E_DLG_STATE_CHANGED(11)>
Sep 18 16:28:45 [236217] DBG:dialog:add_profile_definitions: creating profile <trunkCalls> 
Sep 18 16:28:45 [236217] DBG:core:find_mod_cmd_export_t: found <load_tm> in module tm [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:find_mod_cmd_export_t: found <load_rr> in module rr [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:find_mod_cmd_export_t: found <load_clusterer> in module clusterer [/usr/local/opensips-upstream/lib64/opensips/modules/]
Sep 18 16:28:45 [236217] DBG:core:sr_register_identifier: adding new identifier [cap:dialog-dlg-repl] to group [clusterer]
Sep 18 16:28:45 [236217] DBG:clusterer:cl_register_cap: Registered capability: dialog-dlg-repl
Sep 18 16:28:45 [236217] DBG:clusterer:cl_request_sync: requesting dialog-dlg-repl sync in cluster 1
Sep 18 16:28:45 [236217] DBG:clusterer:cl_request_sync: donor node not found
Sep 18 16:28:45 [236217] CRITICAL:core:sig_usr: segfault in attendant (starter) process!
Sep 18 16:28:45 [236217] DBG:core:restore_segv_handler: restoring SIGSEGV handler...
Sep 18 16:28:45 [236217] DBG:core:restore_segv_handler: successfully restored system SIGSEGV handler
Sep 18 16:28:45 [236214] DBG:core:wait_status_code: read code 28 (0 byte)
Sep 18 16:28:45 [236214] INFO:core:daemonize: pre-daemon process exiting with -1

backtrace full

#0  queue_sync_request (cluster=0x7fc5bda51800, lcap=0x7fc5bda560f0) at sync.c:97
        __FUNCTION__ = "queue_sync_request"
#1  0x00007fc5bd0f2003 in cl_request_sync (capability=0x7fc5bd0dea10 <dlg_repl_cap>, cluster_id=1, from_cb=0) at sync.c:174
        cluster = 0x7fc5bda51800
        lcap = 0x7fc5bda560f0
        source_id = 0
        rc = -1
        __FUNCTION__ = "cl_request_sync"
#2  0x00007fc5bd08bc53 in dlg_init_clustering () at dlg_replication.c:160
        __FUNCTION__ = "dlg_init_clustering"
#3  0x00007fc5bd06cbc6 in mod_init () at dialog.c:843
        n = 59809476
        __FUNCTION__ = "mod_init"
#4  0x000055a21d03b87f in init_mod (m=0x7fc5bf2b6980, skip_others=0) at sr_module.c:761
        dep = 0x0
        __FUNCTION__ = "init_mod"
#5  0x000055a21d03b73f in init_mod (m=0x7fc5bf2b7450, skip_others=0) at sr_module.c:742
        dep = 0x55a21d3e33e0 <cdb_slow_queries>
        __FUNCTION__ = "init_mod"
#6  0x000055a21d03bd10 in init_modules () at sr_module.c:821
        currentMod = 0x0
        ret = 32764
        __FUNCTION__ = "init_modules"
#7  0x000055a21d037055 in main (argc=4, argv=0x7ffc6bcdf728) at main.c:880
        result = 0
        pred_cmp = 0x55a21d033932 <op_neq>
        c = -1
        n = 12
        tmp = 0xf0b5ff <error: Cannot access memory at address 0xf0b5ff>
        tmp_len = 59809476
        port = -1
        proto = 0
        protos_no = 2
        options = 0x55a21d1dad48 "A:f:cCm:M:b:l:n:N:rRvdDFEVhw:t:u:g:p:P:G:W:o:a:k:s:"
        ret = -1
        seed = 2615788149
        rfd = 3
        procs_no = 8388608
        __FUNCTION__ = "main"

OS/environment information

Additional context

tommybrecher commented 1 day ago

@bogdan-iancu, few more comments I have to add. I have found out why I was unable to reproduce this issue in my testing environment. cluster_list does have a ->current_cluster in my lab example.

I have pinpointed the problem to be an ordering issue (node_id on the local node is higher than the node_id of the remote node).

I haven't drilled down into the query that clusterer runs, but I assume it is ordering by node_id ASC.

In my case since the lower node_id was that of the remote node and that failed on DNS resolution, ->current_node was NULL on the next run resulting in the segmentation fault.

Additionally, I've found the correct place to make opensips gracefully quit without core-dumping (but I'm not sure if you want a more thorough fix.)

See PR #3474