Teekeks / pyTwitchAPI

A Python 3.7 compatible implementation of the Twitch API, EventSub, PubSub and Chat
https://pytwitchapi.dev
MIT License
254 stars 38 forks source link

Handle broken aiohttp websocket in chat bot client #271

Closed Latent-Logic closed 11 months ago

Latent-Logic commented 11 months ago

It turns out that aiohttp WebSocket code does not recognize if the underlying connection is broken: https://github.com/aio-libs/aiohttp/issues/2309 As a twitch client we know we'll at minimum receive a PING request every 5 minutes even if no other channel updates are happening: https://dev.twitch.tv/docs/irc/#keepalive-messages (I tested this and saw every a ping request always happened between every 4min 1sec to 4min 51sec)

If we miss a PING and fail to respond twitch considers the connection closed. Thus we can add a timeout to our websocket receive() call and if we've heard nothing in over 5 minutes our connection is broken and we should perform a reconnect. This has been tested running the bot, and then trying 3 different types of connection breaking: 1) Disabling the network interface on my computer for 5 minutes 2) Suspending laptop and resuming later 3) Dropping related active NAT state on router (similar to rebooting router)

Teekeks commented 11 months ago

I feel like going within seconds of a undocumented time period for a timeout seems unreasonable close.

Since its undocumented twitch could decide at any moment to change that value to e.g. 6 minutes or more resulting in frequent reconnects for some small bots.

Putting the default value to e.g. 10 minutes and making that value configurable should be the minimum.

Latent-Logic commented 11 months ago

Makes sense, I'll have it be a class variable settable in the init method.

Latent-Logic commented 11 months ago

If you want I could make it an optional variable that defaults to None and thus keep the current "hang forever" behavior, but I do think it is better to default to a state where it will reconnect if it hangs after too long.

Teekeks commented 11 months ago

having the timeout is a good idea, however it would also be a good idea to have None as a valid value which it currently is not due to the *60

Latent-Logic commented 11 months ago

None is now valid, and changed the type from int to float as an int is a valid value for a float, and a user might want to specify 5.5 minutes or something if they want to give a different timeout (and WebSocketResponse.receive timeout is timeout: Optional[float] = None)

https://peps.python.org/pep-0484/#the-numeric-tower

Rather than requiring that users write import numbers and then use numbers.Float etc., this PEP proposes a straightforward shortcut that is almost as effective: when an argument is annotated as having type float, an argument of type int is acceptable