OpenTTD / dibridge

Discord <-> IRC Bridge
GNU General Public License v2.0
15 stars 3 forks source link

Strange error caused the bridge to stop relaying messages #229

Closed poVoq closed 7 months ago

poVoq commented 7 months ago

I recently saw that dibridge stopped relaying messages in a channel and while a restart fixed that the logs have a strange error:

IRCBridge[1052527]: AttributeError: '_MissingSentinel' object has no attribute 'call_soon_threadsafe'
IRCBridge[1052527]: 2024-04-09 13:57:16 ERROR    [asyncio] Task exception was never retrieved
IRCBridge[1052527]: future: <Task finished name='Task-46858' coro=<IRCRelay._relay_mesage() done, defined at /dibridge/irc.py:1>
IRCBridge[1052527]: Traceback (most recent call last):
IRCBridge[1052527]:   File "/dibridge/irc.py", line 220, in _relay_mesage
IRCBridge[1052527]:     relay.DISCORD.send_message(irc_username, message)
IRCBridge[1052527]:   File "/dibridge/discord.py", line 187, in send_message
IRCBridge[1052527]:     asyncio.run_coroutine_threadsafe(self._send_message(irc_username, message), self.loop)
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/tasks.py", line 936, in run_coroutine_threadsafe
IRCBridge[1052527]:     loop.call_soon_threadsafe(callback)
IRCBridge[1052527]:     ^^^^^^^^^^^^^^^^^^^^^^^^^

This repeated over and over in the log.

TrueBrain commented 7 months ago

That means the loop was made sentinal (basically, NULL'd), which is .. odd, to say the least. I did not know Python was capable of doing that.

I have not seen this error myself, but we are also running on Python 3.10. That might or might not be related.

This needs a bit of investigating :D A very interesting (but concerning) problem :P

TrueBrain commented 7 months ago

Ah, no, I was misreading the backtrace, and misremembering what the code does :)

It appears that the Discord connection was closed. This also resets the loop variable (to a sentinel value). So that explains the issue there.

This might happen when there is a connection error, but I thought it self-restored. Clearly, it does not. But that is an easy thing to experiment with, as that should be reproduceable :D Will look into it further!

poVoq commented 7 months ago

The odd thing is that I run multiple dibridges in parallel on the same server and only one was affected, so this would speak against an connection issue.

TrueBrain commented 7 months ago

The odd thing is that I run multiple dibridges in parallel on the same server and only one was affected, so this would speak against an connection issue.

Connection error is more than a connection issue on your side :) In this case, it seems to be the Discord-side.

In more detail, this happens if the Discord backend disconnect us with an error-code that the Discord library is not used to handle. Which means it is a bit undefined what one can do to recover.

One could argue that you can just reconnect to Discord and try again. But the docs suggest that these issues only happen when there is a configuration error on the bot side (wrong token, wrong shard, etc). So it is likely to happen again. But in this case it seems more likely that the Discord platform made a boo-boo, and booted the bot for the wrong reasons. But to differentiate between those scenarios is tricky.

Possibly a way to recover is to just let the whole bridge crash. If you are like me, it will just restart, and continue its operation. But that feels a bit meh. Need to think a bit more about this :)

TrueBrain commented 7 months ago

One more question for you: how far back do your logs go? It is likely there was a backtrace just before these errors you pasted started. It would be interesting to see that error, if possible?

PS: https://discord.com/developers/docs/topics/opcodes-and-status-codes#gateway-gateway-close-event-codes these are the possible reasons the connection got closed. A few stand out, for example "rate limited" could ofc happen if IRC is really chatty. Or "session timed out" if the bridge is just connected for too long to Discord. But the Discord library I use handles these cases poorly, so I will need to wrap around those cases :)

poVoq commented 7 months ago

No, the bridged channel is pretty quiet, so I doubt is was a rate limit error.

Edit: Ok this seems to be what you wanted:

IRCBridge[1052527]: Exception in thread Thread-1 (start):
IRCBridge[1052527]: Traceback (most recent call last):
IRCBridge[1052527]:   File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
IRCBridge[1052527]:     self.run()
IRCBridge[1052527]:   File "/usr/lib64/python3.11/threading.py", line 982, in run
IRCBridge[1052527]:     self._target(*self._args, **self._kwargs)
IRCBridge[1052527]:   File "/dibridge/discord.py", line 198, in start
IRCBridge[1052527]:     relay.DISCORD.run(token, log_handler=None)
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/client.py", line 860, in run
IRCBridge[1052527]:     asyncio.run(runner())
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/runners.py", line 190, in run
IRCBridge[1052527]:     return runner.run(main)
IRCBridge[1052527]:            ^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
IRCBridge[1052527]:     return self._loop.run_until_complete(task)
IRCBridge[1052527]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 654, in run_until_complete
IRCBridge[1052527]:     return future.result()
IRCBridge[1052527]:            ^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/client.py", line 849, in runner
IRCBridge[1052527]:     await self.start(token, reconnect=reconnect)
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/client.py", line 778, in start
IRCBridge[1052527]:     await self.connect(reconnect=reconnect)
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/client.py", line 659, in connect
IRCBridge[1052527]:     await self.ws.poll_event()
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/gateway.py", line 626, in poll_event
IRCBridge[1052527]:     raise msg.data
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/client_ws.py", line 244, in receive
IRCBridge[1052527]:     msg = await self._reader.read()
IRCBridge[1052527]:           ^^^^^^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/streams.py", line 663, in read
AIRCBridge[1052527]:     return await super().read()
IRCBridge[1052527]:            ^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/streams.py", line 622, in read
IRCBridge[1052527]:     await self._waiter
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/http_websocket.py", line 314, in feed_data
IRCBridge[1052527]:     return self._feed_data(data)
IRCBridge[1052527]:            ^^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/http_websocket.py", line 321, in _feed_data
IRCBridge[1052527]:     for fin, opcode, payload, compressed in self.parse_frame(data):
IRCBridge[1052527]:                                             ^^^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/aiohttp/http_websocket.py", line 486, in parse_frame
IRCBridge[1052527]:     raise WebSocketError(
IRCBridge[1052527]: aiohttp.http_websocket.WebSocketError: Received fragmented control frame
IRCBridge[1052527]: Exception in thread Thread-334:
IRCBridge[1052527]: Traceback (most recent call last):
IRCBridge[1052527]:   File "/usr/lib64/python3.11/threading.py", line 1045, in _bootstrap_inner
IRCBridge[1052527]:     self.run()
IRCBridge[1052527]:   File "/dibridge/.env/lib64/python3.11/site-packages/discord/gateway.py", line 169, in run
IRCBridge[1052527]:     f = asyncio.run_coroutine_threadsafe(coro, loop=self.ws.loop)
IRCBridge[1052527]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/tasks.py", line 936, in run_coroutine_threadsafe
IRCBridge[1052527]:     loop.call_soon_threadsafe(callback)
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 807, in call_soon_threadsafe
IRCBridge[1052527]:     self._check_closed()
IRCBridge[1052527]:   File "/usr/lib64/python3.11/asyncio/base_events.py", line 520, in _check_closed
IRCBridge[1052527]:     raise RuntimeError('Event loop is closed')
IRCBridge[1052527]: RuntimeError: Event loop is closed
IRCBridge[1052527]: /usr/lib64/python3.11/threading.py:1047: RuntimeWarning: coroutine 'DiscordWebSocket.send_heartbeat' was never awaited
IRCBridge[1052527]:   self._invoke_excepthook(self)
IRCBridge[1052527]: RuntimeWarning: Enable tracemalloc to get the object allocation traceback
poVoq commented 7 months ago

Updated the above error log, sorry for the previous one I mistakenly thought was not the same.

TrueBrain commented 7 months ago

Thank you! So the root-cause was Received fragmented control frame. Which terminated the websocket to Discord, which is an event not handled by the Discord library, and this bridge simply does nothing with it, causing the other exceptions you see.

That is fixable :)

TrueBrain commented 7 months ago

Right, should be fixed, and not resurface again :) It now basically reconnects automatically in such events.

Thank you for the reports and additional information. Dibridge is now one step better than it was before :D