Open drnick23 opened 7 months ago
That also happened here. Normally it recovers, but here drsolver stopped solving and needed restart:
[2024-02-26 01:42:34.905] [error] DrError code: 203 [2024-02-26 01:42:39.818] [info] Connection closed. Code: 1006 Reason: Abnormal closure [2024-02-26 01:42:40.523] [info] Connection closed. [2024-02-26 01:42:40.523] [info] Connecting to server https://solvers.drplotter.com... [2024-02-26 01:42:40.988] [info] Connected to server! (465.076953 ms) [2024-02-26 01:42:40.989] [info] Solving registration challenge... [2024-02-26 01:42:42.195] [info] success! (1205.47763 ms) [2024-02-26 01:42:42.997] [info] Connection established [2024-02-26 02:03:48.345] [error] DrError code: 203 [2024-02-26 02:03:53.818] [info] Connection closed. Code: 1006 Reason: Abnormal closure [2024-02-26 02:03:54.756] [info] Connection closed. [2024-02-26 02:03:54.757] [info] Connecting to server https://solvers.drplotter.com... [2024-02-26 02:03:55.227] [info] Connected to server! (470.116697 ms) [2024-02-26 02:03:55.227] [info] Solving registration challenge... [2024-02-26 02:03:56.438] [info] success! (1210.412807 ms) [2024-02-26 02:03:57.242] [error] Connection error: Expecting status 101 (Switching Protocol), got 520 status connecting to wss://solvers.drplotter.com/solver, HTTP Status line: HTTP/1.1 520
As a workaround I use this script "DrWatchdog" to recover automatically:
drplotter@machine:~$ cat DrWatchdog
# !/bin/bash
# [2024-02-26 09:16:33.700] [error] Connection error: Expecting status 101 (Switching Protocol), got 520 status connecting to wss://solvers.drplotter.com/solver, HTTP Status line: HTTP/1.1 520
# #retries: 0 wait time(ms):0 http status:520
echo This is DrWatchdog
while true
do
sleep 60
tail -2 $HOME/.drplotter/logs/drsolver_gpu_0.log |grep "Connection error:" && {
echo DrSolver hangs `date`
killall -9 drsolver
sleep 1
# Needs nohup, else it stops, probably trying to read from stdin
nohup drsolver > drsolver.out &
echo You can run tail -f drsolver.out as the initial process was killed, recover the terminal with tset and reset
}
sleep 60
done
``
retries: 0 wait time(ms):0 http status:520
Happens following DrError code: 203, a disconnect, then reconnect to server and post solving registration challenge success.