Closed veox closed 5 years ago
Without digging in at all, this looks like we need graceful reconnection upon losing the websocket connection (with some sort of backstop to limit how many attempts to reconnect will be made).
with some sort of backstop to limit how many attempts to reconnect will be made
Or maybe an exponential increase in time between reconnect attempts, up to some maximum, like once every 24 hours. (which means the operator doesn't have to reboot if the stats server has been offline for a long time)
@carver good call on not hard failing. Didn't think about the fact that an ethstats outage could then take all of the connected trinity nodes offline which is something we do not want.
It looks like the EthstatsService
does have code that is supposed to deal with lost connections and reconnect.
However, it doesn't seem as if that's working as expected. I noticed this exception and the service not recovering (reconnecting) after it.
ERROR 09-14 04:42:49 EthstatsClient Unexpected error in <trinity.plugins.builtin.ethstats.ethstats_client.EthstatsClient object at 0x7f909ac8ecd0>, exiting
Traceback (most recent call last):
File "/usr/src/app/trinity/p2p/service.py", line 118, in run
await self._run()
File "/usr/src/app/trinity/trinity/plugins/builtin/ethstats/ethstats_client.py", line 52, in _run
self.recv_handler(),
File "/usr/src/app/trinity/p2p/cancellable.py", line 43, in wait_first
return await token_chain.cancellable_wait(*awaitables, timeout=timeout)
File "/usr/local/lib/python3.7/site-packages/cancel_token/token.py", line 178, in cancellable_wait
return done.pop().result()
File "/usr/src/app/trinity/trinity/plugins/builtin/ethstats/ethstats_client.py", line 58, in recv_handler
json_string: str = await self.websocket.recv()
File "/usr/local/lib/python3.7/site-packages/websockets/protocol.py", line 352, in recv
yield from self.ensure_open()
File "/usr/local/lib/python3.7/site-packages/websockets/protocol.py", line 514, in ensure_open
self.close_code, self.close_reason) from self.transfer_data_exc
websockets.exceptions.ConnectionClosed: WebSocket connection is closed: code = 1000 (OK), no reason
DEBUG 09-14 04:42:49 EthstatsClient <trinity.plugins.builtin.ethstats.ethstats_client.EthstatsClient object at 0x7f909ac8ecd0> halted cleanly
I haven't investigated but it looks like the exception doesn't reach the except
that is supposed to catch it.
I haven't investigated but it looks like the exception doesn't reach the except that is supposed to catch it.
Just an interesting tid-bit. The new TrioService
would have worked correctly as the Manager.run()
correctly raises service exceptions when it terminates.
git describe --tags
:trinity-v0.1.0-alpha.16-71-gca9f90a6
python --version
:Python 3.7.0
pip freeze
): MISSINGWhat is wrong?
On remote
netstats
(ethstats
server) restarting, atrinity
node (at least when run in LES light-client mode) that's been previously connected to it no longer appears in the list.Run as (via
systemd
):INFO-level trace (for some reason, it's printed twice):
How can it be fixed
Haven't investigated.