Open panicfarm opened 1 year ago
Indeed, this library can auto ping, but it lacks auto-reconnect. When the connection is broken, for now I suggest to exit the whole process and start the process again, for example, https://github.com/crypto-crawler/crypto-crawler-rs/blob/main/crypto-ws-client/src/common/ws_client_internal.rs#L197.
I'll think about a solution about how to re-connect.
Indeed, this library can auto ping, but it lacks auto-reconnect. When the connection is broken, for now I suggest to exit the whole process and start the process again, for example, https://github.com/crypto-crawler/crypto-crawler-rs/blob/main/crypto-ws-client/src/common/ws_client_internal.rs#L197.
I'll think about a solution about how to re-connect.
I think it's imperative for any solid crypto data recording or trading code to be continuously and reliably runnable. Therefore a fast autoreconnect solution (with the smallest possible gap in the recorded data) is necessary.
Agreed, in theory it is possible to support auto-reconnect as long as we remember all subscribed symbols.
If you could outline how the reconnect should occur, as well as where you think the best place is to store the subscribed symbols, I could probably submit a PR in a bit.
The original read error occurred in https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L104 . Then there was a subsequent independent error (because the socket was already disconnected) in tokio-tungstenite
on https://github.com/snapview/tokio-tungstenite/blob/87d2f7eb09a538c0a0ee77bd92e032e362118f72/src/lib.rs#L337 (that is an implementation of the futures_util::sink::Sink
trait ) . The docs of start_send()
say "In most cases, if the sink encounters an error, the sink will permanently be unable to receive items." and thus we got subsequent repeat errors on pings sending. This actually tells me that after https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L104 it broke out of the loop in https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L76 and thus exited connect_async()
https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L51 without error, because the error was never caught there it seems.
Perhaps we could catch there error on exit from connect_async_internal()
(it appears that it actually returns an Ok()
variant after breaking out of the receive loop, or there https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L53 where it's assigned to _
and initiate a reconnect?
PS. you can simulate kucoin closing websocket connection (sending RST, or ECONNRESET 104
) by tcpkill host ws-api-spot.kucoin.com and port 443
while the process is running
Indeed, this library can auto ping, but it lacks auto-reconnect. When the connection is broken, for now I suggest to exit the whole process and start the process again, for example, https://github.com/crypto-crawler/crypto-crawler-rs/blob/main/crypto-ws-client/src/common/ws_client_internal.rs#L197. I'll think about a solution about how to re-connect.
I think it's imperative for any solid crypto data recording or trading code to be continuously and reliably runnable. Therefore a fast autoreconnect solution (with the smallest possible gap in the recorded data) is necessary.
yes. autoreconnect solution is better than pm2 restart.
@soulmachine should I work on a PR or will you add this functionality?
@panicfarm I would appreciate if you could create a PR. I think you need to modify line 197 and 221 in ws_client_internal.rs
Currently the two lines just crash for simplicity, you'll need to modify the two lines to support auto reconnect
@soulmachine When it disconnects due to ECONNRESET 104 (presumably when an exchange restarts its server), the error occurs on 104 in connect_async.rs and it returns from connect_async
here without an error and without a message to be handled in the while loop here. Therefore it never gets to neither 197 nor 221 in ws_client_internal.rs. I think it should return an Error
from connect_async
here and handle the reconnect on matching that Error
variant. Do you agree?
But without modifiy lines 197 and 221 in ws_client_internal.rs it will not reconnect on websocket Closed (very often case, especially on binance perp) or some kind of Reconnect message ..
@somefact yes, there should be the same reconnection logic in all these places, although i have only encountered three disconnects on 104 in connect_async.rs after running it for about 10.days total with Kucoin
@panicfarm You've made very good point, I think you can start to implement the re-connection logic inside connect_async_internal()
Feel free to send a PR.
Feel free to send a PR.
@soulmachine I have looked at the code in detail and would appreciate any feedback before I make changes.
When the exchange reboots its server, the disconnect happens here. However, by this time you have already returned from connect_async_internal
because you drop the handle to tokio::task::spawn
and therefore the program returns from ws_client.send(&commands).await
and you are now in https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-crawler/src/crawlers/kucoin.rs#L24.
I could hold on to the send task handle
let send_task_handle = tokio::task::spawn(async move {
loop {
tokio::select! {
command = command_rx.recv() => {
match command {
Some(command) => {
match command {
...
and pass it into run()
, and then when the error occurs, error out in run
and enclose the entire send();run();
block in each exchange (unfortunately) into a reconnect loop, something like
loop {
let ws_client = KuCoinSpotWSClient::new(tx, None).await;
match ws_client.send(&commands).await {
Error => continue,
Ok(send_task_handle) => {
match ws_client.run(send_task_handle).await {
Error => continue;
}
}
}
ws_client.close().await;
Alternatively, I could somehow cram the reconnect logic into https://github.com/crypto-crawler/crypto-crawler-rs/blob/fb7b102190b7bb4eebafffb8ecddd184672d051f/crypto-ws-client/src/common/connect_async.rs#L59 but I am not sure if it's a good idea, because by this time you are already in the run()
function, except that the task that you spun in send()
function was also running separately.
@panicfarm The first way looks ugly because you make run()
return something weird, which looks scary to users, run
should always return void.
The second way looks better, handling the re-connect in connect_async_internal()
so that it is invisible to users.
@soulmachine I don't think I can completely confine it inside connect_async_internal(ws_stream :WebSocketStream, ...)
, because after that disconnect ws_stream
that is passed into the function is no longer connected? At the minimum I have to do it in 'connect_async()'.
But what about the other cases of disconnect, mentioned in this issue? They happen inside of run()
. Would be nice to unify them all.
hey fellas did this fyix ever get done?
I left the basic README websocket example running for a few days, and then
ECONNRESET 104 Connection reset by peer
occurred and websocket disconnected. After that ping/pongs were failing but there seems to be no auto-reconnection mechanism. I reproduced this several times. I think it should reconnect onERROR crypto_ws_client::common::connect_async] Failed to read, error: IO error: Connection reset by peer (os error 104)
and also on failed ping sends. The log at the moment of the disconnect2023-01-13T12:44:15Z
follows: