bitfinexcom / bitfinex-api-go

BITFINEX Go trading API - Bitcoin, Litecoin, and Ether exchange
https://www.bitfinex.com/
MIT License
309 stars 220 forks source link

Book subscriptions, improved heartbeats #102

Closed brobits closed 6 years ago

brobits commented 6 years ago
mlensment commented 6 years ago

I see that you improved heartbeats, but what's the best way to recover from HB timeouts / other potential errors that might influence the socket? I'm currently thinking of setting the ReconnectAttempts parameter really high, but that does not seem too good of a solution.

The client is running in a separate goroutine and I tried to panic and recover (start the same goroutine again) when the socket closes. It works, but I encountered a dramatic increase in CPU usage. I traced the problem to be related with this line: https://github.com/bitfinexcom/bitfinex-api-go/blob/master/v2/websocket/transport.go#L56.

brobits commented 6 years ago

Could you explain the issue you're having with socket recovery?

Reconnects are handled by the API in two ways: the websocket connection is terminated (network disconnect) or a subscription channel's heartbeat timed out (logical disconnect). in both of these events, if ReconnectAttempts is greater than 0 and AutoReconnect is true, the API will automatically reconnect for you. For each reconnect failure, an internal counter increments until ReconnectAttempts are exhausted. In the event the API does successfully reconnect, the internal reconnect count is set back to 0 and the API may attempt its full reconnect cycle if another disconnect occurs.

If the API is unable to reconnect after ReconnectAttempts unsuccessful tries, the API will shut itself down into a terminal state and the API object is unusable. That's the path you've linked where you identify high CPU usage. You must create a new API object in this event. If you are not attempting to re-use the API object after reconnect failures terminate the API, and you continue to see high CPU usage, please let me know.

I don't recommend infinitely attempting reconnects, this will flood Bitfinex servers with reconnect attempts during outages. Killing your process if the API cannot reconnect is a good practice. Monitoring your service with a process watchdog & restarting during failure events is a standard out-of-band solution.

If you are determined to infinitely reconnect using the API, you could create your own branch and change https://github.com/bitfinexcom/bitfinex-api-go/blob/master/v2/websocket/client.go#L341 to something similar to:

for ; c.parameters.ReconnectAttempts < 0 || c.parameters.reconnectTry < c.parameters.ReconnectAttempts; c.parameters.reconnectTry++ {

and configure your API with ReconnectAttempts less than 0.

mlensment commented 6 years ago

Thank you for a really quick response. I omitted some parts, but the basic idea is this: https://gist.github.com/mlensment/33a160f6e6c6ced53a962fff4370d107

Clearly trying to reconnect every ~5 seconds when the outage is present is not viable, but I recon that after the initial reconnect sequence fails (ReconnectAttempts is exhausted), trying to connect every 10 minutes or so would be perfectly OK.

Since there is no two-tier reconnect mechanism built into the library, I went with panic/recover. Also it would be a good idea to notify Airbrake when tier one reconnect attempts failed. I could do that as well in the recover block.

Generally I would like to avoid maintaining my own branch, but yes, it would be an option.