drachtio / drachtio-server

A SIP call processing server that can be controlled via nodejs applications
https://drachtio.org
MIT License
239 stars 92 forks source link

Latest Drachtio built from source is "Hanging" #254

Closed DBoag closed 1 year ago

DBoag commented 1 year ago

Hi All,

We've been running an older version of Drachtio (v0.8.10) for a while but encountered some weird issues when communicating with it on port 443 (via WSS) as opposed to port 4433 which we've been using until now. We want to move to port 443 for standardisation.

In a low-call-volume environment we found that the latest Drachtio build (v0.8.20-4-gb01823713) resolved the port 443 problem and decided to go the route of upgrading our other servers rather than trying to debug the 443 problem on the current version.

I did that upgrade on one of our busy servers last night and everything tested ok. No other changes were made other than switching out the Drachtio binary and restarting. I tested this morning again before things got busy and all was still ok.

However just after 8, Drachtio stopped logging to it's log file and stopped responding to requests. The Drachtio process was still visible in ps, and netstat still showed many connections to port 4433. The load and memory on the box all looked ok. After a few minutes I restarted Drachtio and everything ran fine for a only few minutes before the same thing happened again.

I then changed back to the older Drachtio version and all has been running fine since. I can't find any errors or anything suspicious in the system or Drachtio logs around the time of the failure. It looks like that new Drachtio version just "gives up" under load.

Has anything like this been experienced before? Any ideas how I can debug this further? We're running on Ubuntu 20.04.3 LTS.

Thanks, Duncan

davehorton commented 1 year ago

Can you get a log (or retrieve the existing log) from when this happens? If possible it would be great if drachtio server was loglevel debug and sofia-loglevel 9

davehorton commented 1 year ago

another thing that would be good would be to see if this problem occurs if you build and run v0.8.20. I am trying to localize what recent change may be related to this issue

DBoag commented 1 year ago

Hi Dave,

Thanks for the response. We did have debug and log level 9 set. I've attached an extract of the log either side of the "hang". I've inserted an extra line with "--- GAP HERE ---" where the logging stopped so it's easy to find, but you can see the timestamps have a big gap there.

I will build v0.8.20 and try that out as you suggest and give you feedback.

Regards, Duncan

drachtiohang1.txt

davehorton commented 1 year ago

that log seems strange -- no sip messages shown, other stuff missing. Its clearly not the raw log file. Have you got that?

davehorton commented 1 year ago

feel free to send me directly if necessary (daveh@drachtio.org) and/or join my slack channel for faster turnound - joinslack.jambonz.org

DBoag commented 1 year ago

This conversation currently continuing on Slack. I'll update here as appropriate for the community.

DBoag commented 1 year ago

It appears this issue might have something to do with how we built Drachtio on Ubuntu. We're now running an official Drachtio Docker image and so far the issue has not occurred again. We're not sure exactly what scenario causes this though. I'm leaving this open for another week or so just to confirm the issue is resolved by the Docker image as it does occur intermittently.

DBoag commented 1 year ago

Closing as Docker images resolve the issue. We need to identify why our builds are not working properly, but that looks like our problem.