Open tommybrecher opened 1 day ago
@bogdan-iancu, few more comments I have to add. I have found out why I was unable to reproduce this issue in my testing environment. cluster_list
does have a ->current_cluster
in my lab example.
I have pinpointed the problem to be an ordering issue (node_id on the local node is higher than the node_id of the remote node).
I haven't drilled down into the query that clusterer runs, but I assume it is ordering by node_id ASC.
In my case since the lower node_id was that of the remote node and that failed on DNS resolution, ->current_node
was NULL on the next run resulting in the segmentation fault.
Additionally, I've found the correct place to make opensips gracefully quit without core-dumping (but I'm not sure if you want a more thorough fix.)
See PR #3474
OpenSIPS version you are running
Describe the bug
When starting opensips, if a DNS entry exists in the url field of the clusterer DB table which can't be resolved (missing DNS entry), opensips will segfault in sync.c:97 (queue_sync_request). This happens because
cluster->current_node
is NULL, resulting in a segmentation fault when trying to access->flags
To Reproduce
opensips.cfg
Clusterer table
Expected behavior
Relevant System Logs
backtrace full
OS/environment information
Additional context
if (cluster->current)
but ran into other issues with other areas in the code where the same access is attempted (timer, etc) and after multiple attempts just got opensips deadlocked.