Closed kuon closed 2 years ago
I removed the genserv and use websockex state directly and the issue is gone. I guess websockex must be doing something that is not compatible with it being nested in a genserv.
I finaly understood what was happening, it was not the genserv fault. It was just a coincidence.
The websocket endpoint was temporarily offline for a sec, this triggered a disconnection, which killed the process. The supervisor would restart immediately the websocket process but the endpoint wasn't back up, which would kill the process again and then the supervisor would restart it again. This would hit the supervisor "max_restart" and kill the whole application.
The solution I ended up using is to "wrap" the websocket process with an intermediate process. This intermediate process is trapping exit and is managing an exponential backoff for retrying reconnection.
Sorry, I've been sick/busy for the last week or so.
There is actually an actionable item here and it's better logging when the process crashes. Right now if it's under a supervisor, then the error is eaten by the supervisor trap and never shown.
There was an issue with getting stack traces correctly that prevented a good error message with a stacktrace when I first wrote WebSockex. That's since been fixed and needs to be implemented.
I have a websockex client inside a phoenix app, and if I turn the debug info, I have the following logs:
What is strange is that my whole Phoenix app (called Box) is stopped.
Also I am wondering why the websocket does not reconnect.
If I call
Application.ensure_all_started(:box)
from thex iex prompt, it reconnects immediately.My tree is:
Box.Application
- default phoenix supervisor app, I just addedBox.SocketClient
to the children list.Box.SocketClient
- is a genserver, and ininit
I callBox.SocketIO.Socket.start_link()
Box.Socket
- hasuse WebSockex