Respect Discord login limits with JDA in mind

schnapster commented 7 years ago

So the coin system works ok, but JDA itself does some reconnecting too which uses up the rates.

Find a way to work together with JDA (the hardcore option would be to turn off the reconnects and handle them fully ourselves). We might also discuss whether the original issue what we added the watchdog for has been resolved (or maybe a misunderstanding on our side about a lot of shards taking up a lot of time to get revived by JDA?).

Below is a problematic log from today showing how both JDA and our own system try to start shards resulting in rate limit hits (I think I grepped all relevant actors there): https://hastebin.com/fayanayaze.css JDA uses up a login when it logs (Re)Loading The first batch (at 10:16 log time) was triggered by a stop the world garbage collection as there are a few seconds missing in the log:

[10:15:34] [ INFO ] [EventListenerBoat] [10:16:16] [ ERROR] [LocalAudioTrackExecutor]

and there is a corresponding STW event in the gc log. So the Watchdog started killing the shards from shard 0 onwards, but stopped killing them as they started getting events again.

The second one appears to be a network issue where some of the shards could be resumed by JDA without a login, some were resumed using up a log in, and some were killed by the Watchdog. The problem here probably is that reviving so many shards takes up so much time for JDA (who appears to rate limits logins itself, probably by adhering to what discord reports back as left ratelimits when you make a request), that the shards get picked up as inactive by the watchdog.

Solutions to explore:

Check the status of shards more closely by the watchdog, for example don't kill shards that are reported by JDA as "reviving" (idk if such a thing is reported)
Have the Watchdog run a lot more infrequently so that JDA gets a chance to revive the shards first
Handle reconnecting completely by ourselves

freyacodes commented 7 years ago

Resuming does not use up the rate limit FYI. Jake explicitly pointed that out

schnapster commented 7 years ago

JDA does both, resuming and a full login. They told me the full thing happens when theres a (Re)Loading logged, and so far logs are also right about - when we are restarting shards, and JDA (Re)loads some of them too, we run into the discord rate limit. Here's another log from today:

Watchdog starts a few revive tasks, then while those are going JDA (Re)loads a few shards and ratelimits happen:

[05:22:13] [ WARN ] [ShardWatchdogAgent] Reviving shard [26 / 309] after 60 seconds of no events. Last event received was UserOnlineStatusUpdateEvent
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 25
[05:22:13] [ WARN ] [ShardWatchdogAgent] Reviving shard [28 / 309] after 60 seconds of no events. Last event received was UserOnlineStatusUpdateEvent
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 26
[05:22:13] [ WARN ] [ShardWatchdogAgent] Reviving shard [29 / 309] after 60 seconds of no events. Last event received was MessageEmbedEvent
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 28
[05:22:13] [ WARN ] [ShardWatchdogAgent] Reviving shard [30 / 309] after 60 seconds of no events. Last event received was UserOnlineStatusUpdateEvent
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 29
[05:22:13] [ WARN ] [ShardWatchdogAgent] Reviving shard [31 / 309] after 60 seconds of no events. Last event received was UserGameUpdateEvent
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 30
[05:22:13] [ INFO ] [FredBoatBot] Reviving shard 31
[05:22:14] [ WARN ] [ShardWatchdogAgent] Reviving shard [186 / 309] after 61 seconds of no events. Last event received was UserGameUpdateEvent
[05:22:14] [ INFO ] [FredBoatBot] Reviving shard 186
[05:22:14] [ WARN ] [ShardWatchdogAgent] Reviving shard [198 / 309] after 61 seconds of no events. Last event received was UserOnlineStatusUpdateEvent
[05:22:14] [ INFO ] [FredBoatBot] Reviving shard 198
[05:22:17] [ INFO ] [JDA] Finished (Re)Loading!
[05:22:18] [ INFO ] [FredBoat] Coin for shard 186
[05:22:18] [ INFO ] [JDA] Finished (Re)Loading!
[05:22:20] [ INFO ] [JDA] Finished Loading!
[05:22:22] [ INFO ] [JDA] Finished (Re)Loading!
[05:22:22] [ INFO ] [JDA] Finished Loading!
[05:22:23] [ INFO ] [FredBoat] Coin for shard 5
[05:22:28] [ INFO ] [JDA] Finished Loading!
[05:22:28] [ INFO ] [FredBoat] Coin for shard 29
[05:22:30] [ INFO ] [JDA] Finished Loading!
[05:22:34] [ INFO ] [FredBoat] Coin for shard 26
[05:22:35] [ ERROR] [FredBoatBot] Got rate limited while building bot JDA instance! Retrying...
[05:22:35] [ INFO ] [JDA] Finished Loading!
[05:22:39] [ INFO ] [FredBoat] Coin for shard 8
[05:22:40] [ ERROR] [FredBoatBot] Got rate limited while building bot JDA instance! Retrying...
[05:22:41] [ INFO ] [JDA] Finished (Re)Loading!
[05:22:44] [ INFO ] [FredBoat] Coin for shard 216
[05:22:47] [ INFO ] [JDA] Finished (Re)Loading!
[05:22:49] [ INFO ] [FredBoat] Coin for shard 30
[05:22:50] [ ERROR] [FredBoatBot] Got rate limited while building bot JDA instance! Retrying...
[05:22:52] [ INFO ] [JDA] Finished (Re)Loading!

freyacodes commented 7 years ago

Is this fixed by https://github.com/Frederikam/FredBoat/commit/9e95396ca3752b36268a1c58722ba4ac2e04e36d ?

schnapster commented 7 years ago

It's better (probably by a lot and good enough), but not fixed entirely. If a bunch of shards were to be manually revived while jda is (re)loading too, then it may hit the rate limit.

freyacodes / archived-bot

Respect Discord login limits with JDA in mind #230