ws endpoint not responding

drachtio / drachtio-server

A SIP call processing server that can be controlled via nodejs applications

https://drachtio.org

MIT License

248 stars 94 forks source link

ws endpoint not responding #78

Closed andrewvmail closed 5 years ago

andrewvmail commented 5 years ago

Hi Dave,

Is it possible to get an end point in the drachtio websocket server like /healthcheck that returns a http 200 when you do GET to that endpoint.

I'm trying to auto restart drachtio-server using https://rancher.com/blog/2018/2018-08-22-k8s-monitoring-and-healthchecks/

Currently it does return something but a 400 Screen Shot 2019-05-02 at 6 33 44 PM

when the container doesn't respond for any reason with this healthcheck api i can restart the container using ranchers built in health check system.

Thanks Andrew

davehorton commented 5 years ago

Yeah that sounds like a good idea, I will look into it. Meanwhile, do note that you can do the check to verify something is listening on a tcp socket (if drachtio is configured for tcp).

davehorton commented 5 years ago

actually....I forgot that I just recently added support for Prometheus monitoring:

https://drachtio.org/docs/drachtio-server#monitoring-section

If you enable this feature in your config this gives you an HTTP endpoint to hit. Not only does it return a 200 OK, but it gives you a bunch of stats as well.

This should work for your needs, right?

andrewvmail commented 5 years ago

Oh wow nice, this will work. Thanks!

andrewvmail commented 5 years ago

Hi Dave, unfortunately we got into a state where the prometheus endpoint is responding but the websocket server is not responding... try to introspect using TCP did a netcat on the websocket endpoint (port 80) and shows still open but service is down. Might need that /healthcheck on the websocket part..

davehorton commented 5 years ago

please send a drachtio log when you send a SIP invite to the websocket

andrewvmail commented 5 years ago

just diving down into the internals, i found the handshake code. i though its running a "webserver". maybe not possible to add that /healthcheck without modifying sofia

https://freeswitch.org/stash/projects/FS/repos/freeswitch/browse/libs/sofia-sip/libsofia-sip-ua/tport/ws.c#251

andrewvmail commented 5 years ago

please send a drachtio log when you send a SIP invite to the websocket

Okay getting logs. I think I can run a container to translate the 400 into a 200 also in the mean time.

davehorton commented 5 years ago

no it is not running a webserver....if you listen on a ws/wss port that is for sip over websocket traffic

andrewvmail commented 5 years ago

Dave, sent you the logs through your email channel in a secret gist.

davehorton commented 5 years ago

The problem we have here seems to be that the INVITE got sent to a drachtio client that did not respond. So either (a) the server has a bug and did not in fact send the INVITE to the client (even though it thinks it did), or the client app has an issue / did not respond. How far back do you have logs? Can you go back as far as when the app last successfully responded to an INVITE?

In short, I'm not sure its just an issue of testing the server's healthiness. There seems to be a bug here, and it may possibly be on the client side

andrewvmail commented 5 years ago

Hi Dave, sorry I thought you just wanted a sample of an invite that im sending to the server. That log is not from the time where the websocket is not responding there is more to that log it was cut off early. I just upgraded some stuff and lost older logs in the process.

I got a solution to test for unresponsive connection by running a small service to proxy the http request then i just modify the response code from 400 to 200. To what rancher health check system is expecting. Just so it doesn't get stuck, probably still need to get down to the underlying issue.

I will monitor the server and send you the log when the connection get unresponsive again.

davehorton commented 5 years ago

yes, lets see if you can get it to be unresponsive again then lets see the logs

andrewvmail commented 5 years ago

sounds good, btw not related just browsing your repo and found https://github.com/davehorton/drachtio-cpaas-portal looks interesting i'm curious whats cookin

andrewvmail commented 5 years ago

ws_endpoint_not_responding_·_Issue__78_·_davehorton_drachtio-server Interesting bug in Github