Closed L235 closed 2 years ago
Honestly, if on_data_error would just quit the process that would be satisfactory.
The issue's branch should update the bot to try to reconnect for any error that causes a disconnection and quit if it isn't able to reconnect or if the error is unrelated to a disconnection. I haven't tested the quitting part but in my test it does handle ping timeouts by attempting to reconnect.
Hm, I'm not sure this is working. On 206f4cc409d8e0fe4d40ba8b9f147a849c286f81, here is what happens if I use NickServ to GHOST the bot running this branch (to simulate an unexpected quit):
ERROR:pydle.client:Unexpected disconnect. Attempting to reconnect within 5 seconds.
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-4' coro=<BasicClient.handle_forever() done, defined at /home/kevin/.local/share/virtualenvs/hat-collector-WgNW31Vi/lib/python3.9/site-packages/pydle/client.py:363> exception=SSLError(1, '[SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)')>
Traceback (most recent call last):
File "/home/kevin/.local/share/virtualenvs/hat-collector-WgNW31Vi/lib/python3.9/site-packages/pydle/client.py", line 367, in handle_forever
data = await self.connection.recv(timeout=self.READ_TIMEOUT)
File "/home/kevin/.local/share/virtualenvs/hat-collector-WgNW31Vi/lib/python3.9/site-packages/pydle/connection.py", line 115, in recv
return await asyncio.wait_for(self.reader.readline(), timeout=timeout)
File "/usr/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
return fut.result()
File "/usr/lib/python3.9/asyncio/streams.py", line 540, in readline
line = await self.readuntil(sep)
File "/usr/lib/python3.9/asyncio/streams.py", line 632, in readuntil
await self._wait_for_data('readuntil')
File "/usr/lib/python3.9/asyncio/streams.py", line 517, in _wait_for_data
await self._waiter
File "/usr/lib/python3.9/asyncio/sslproto.py", line 528, in data_received
ssldata, appdata = self._sslpipe.feed_ssldata(data)
File "/usr/lib/python3.9/asyncio/sslproto.py", line 206, in feed_ssldata
self._sslobj.unwrap()
File "/usr/lib/python3.9/ssl.py", line 948, in unwrap
return self._sslobj.shutdown()
ssl.SSLError: [SSL: APPLICATION_DATA_AFTER_CLOSE_NOTIFY] application data after close notify (_ssl.c:2756)
The application does not quit and the bot does not reconnect:
00:41 *** hats_alpha (~hats_alph@wikimedia/bot/hat-collector) has quit (Killed (NickServ (GHOST command used by hats_beta!~hats_beta@wikimedia/bot/hat-collector)))
(no reconnect)
I'm not able to reproduce this. On 206f4cc409d8e0fe4d40ba8b9f147a849c286f81 this is what I get when I start the bot, wait for it to connect, then ghost it:
INFO:root:Connecting to DB
INFO:root:Preparing bot
INFO:root:Running bot
INFO:root:Syncing report channels
INFO:root:Running query: SELECT name FROM channels
INFO:root:Syncing rules
INFO:root:Running query: SELECT wiki, type, pattern, channel, ignore FROM rules ORDER BY ignore DESC
ERROR:pydle.client:Unexpected disconnect. Attempting to reconnect within 5 seconds.
INFO:root:Syncing report channels
INFO:root:Running query: SELECT name FROM channels
INFO:root:Syncing rules
INFO:root:Running query: SELECT wiki, type, pattern, channel, ignore FROM rules ORDER BY ignore DESC
The reconnection isn't even handled through part of this repos code, it's done in pydle's on_disconnect
method here:
https://github.com/Shizmob/pydle/blob/286e53363e72e8c10f09ee60c51f48ac5a9d8785/pydle/client.py#L321-L340
It looks like during your reconnection an error is thrown at https://github.com/Shizmob/pydle/blob/286e53363e72e8c10f09ee60c51f48ac5a9d8785/pydle/client.py#L367 but then it's just ignored. I'm not sure if the ignoring is pydle's fault or ours but I am able to reproduce that by making a !command throw an exception.
Overriding on_disconnect
and having it do nothing might be a workaround for this issue but figuring out why exceptions are being ignored or why we're experiencing different behavior might lead to a better solution.
I think I forgot to mention I made a commit b9eae087f9cb9ca6c733cb3c569e1d9710f5316c that might fix this. The change should prevent the bot from attempting to reconnect after an unexpected disconnection which will (hopefully) end the process and allow it to be restarted by a watchdog. I don't think I tested this at all myself since I wasn't able to reproduce the original error.
I think I forgot to mention I made a commit b9eae08 that might fix this. The change should prevent the bot from attempting to reconnect after an unexpected disconnection which will (hopefully) end the process and allow it to be restarted by a watchdog. I don't think I tested this at all myself since I wasn't able to reproduce the original error.
b9eae087f9cb9ca6c733cb3c569e1d9710f5316c likely fixed any remaining issues. I haven't noticed any unresolved disconnects. Thanks very much; closing.
Describe the bug When the bot experiences a ping timeout/connection reset it should exit so it can be restarted. However it just keeps running, with the application still running but not on IRC.
To reproduce Steps to reproduce the behavior:
Expected behavior The application should quit, allowing the daemon to restart it.
Console output
Environment:
Additional context Add any other context about the problem here.