Closed andrewvmail closed 5 years ago
Yeah that sounds like a good idea, I will look into it. Meanwhile, do note that you can do the check to verify something is listening on a tcp socket (if drachtio is configured for tcp).
actually....I forgot that I just recently added support for Prometheus monitoring:
https://drachtio.org/docs/drachtio-server#monitoring-section
If you enable this feature in your config this gives you an HTTP endpoint to hit. Not only does it return a 200 OK, but it gives you a bunch of stats as well.
This should work for your needs, right?
Oh wow nice, this will work. Thanks!
Hi Dave, unfortunately we got into a state where the prometheus endpoint is responding but the websocket server is not responding... try to introspect using TCP did a netcat on the websocket endpoint (port 80) and shows still open but service is down. Might need that /healthcheck on the websocket part..
please send a drachtio log when you send a SIP invite to the websocket
just diving down into the internals, i found the handshake code. i though its running a "webserver". maybe not possible to add that /healthcheck without modifying sofia
please send a drachtio log when you send a SIP invite to the websocket
Okay getting logs. I think I can run a container to translate the 400 into a 200 also in the mean time.
no it is not running a webserver....if you listen on a ws/wss port that is for sip over websocket traffic
Dave, sent you the logs through your email channel in a secret gist.
The problem we have here seems to be that the INVITE got sent to a drachtio client that did not respond. So either (a) the server has a bug and did not in fact send the INVITE to the client (even though it thinks it did), or the client app has an issue / did not respond. How far back do you have logs? Can you go back as far as when the app last successfully responded to an INVITE?
In short, I'm not sure its just an issue of testing the server's healthiness. There seems to be a bug here, and it may possibly be on the client side
Hi Dave, sorry I thought you just wanted a sample of an invite that im sending to the server. That log is not from the time where the websocket is not responding there is more to that log it was cut off early. I just upgraded some stuff and lost older logs in the process.
I got a solution to test for unresponsive connection by running a small service to proxy the http request then i just modify the response code from 400 to 200. To what rancher health check system is expecting. Just so it doesn't get stuck, probably still need to get down to the underlying issue.
I will monitor the server and send you the log when the connection get unresponsive again.
yes, lets see if you can get it to be unresponsive again then lets see the logs
sounds good, btw not related just browsing your repo and found https://github.com/davehorton/drachtio-cpaas-portal looks interesting i'm curious whats cookin
Interesting bug in Github
Hi Dave,
Is it possible to get an end point in the drachtio websocket server like /healthcheck that returns a http 200 when you do GET to that endpoint.
I'm trying to auto restart drachtio-server using https://rancher.com/blog/2018/2018-08-22-k8s-monitoring-and-healthchecks/
Currently it does return something but a 400
when the container doesn't respond for any reason with this healthcheck api i can restart the container using ranchers built in health check system.
Thanks Andrew