eveseat / seat

🌀✳️ SeAT: A Simple, EVE Online API Tool and Corporation Manager
https://eveseat.github.io/docs/
GNU General Public License v2.0
433 stars 142 forks source link

Discord Notification Per-Route Rate Limit Exceeded with Notification Batches #912

Open kibawulf opened 3 months ago

kibawulf commented 3 months ago

Problem: Certain conditions, such as when large numbers of SeAT notification jobs fire in a short period of time utilizing Discord webhooks, can trigger Discord's per-route rate limiting, causing POSTs to fail with a 429 Too Many Requests which can lead to a MaxAttemptsExceededException SeAT exception.

This causes scenarios where large numbers of notification jobs may end up in a failed state and notifications will never be sent which could be risky depending on the severity of the notification (e.g. structure reinforcement, members leaving corps, etc). Current rate limits in SeAT appear to be set to 45/minute to avoid the documented global rate limit of 50/minute, but Discord also has an additional per-route rate limit value that is not specifically documented but is mentioned. This likely means that if a certain number of notifications that produce a Discord message configured to use the same webhook fire at or near the same time, jobs can fail. When this occurs enough times to cause a job failure, no message is sent to the Discord channel, the job retries a few times until it fails, and effectively the notification is lost.

Discord Rate Limiting Documentation: https://discord.com/developers/docs/topics/rate-limits

Expected: Rate limits are handled dynamically via header as per the Discord documentation to avoid rate limit conditions or adjusted further to avoid potential conditions.

Code sections that appear to be configured to handle Discord rate limiting: https://github.com/eveseat/notifications/blob/5.0.x/src/Notifications/AbstractDiscordNotification.php#L32 https://github.com/eveseat/notifications/blob/5.0.x/src/NotificationsServiceProvider.php#L226

Logs / Screenshots / Proof:

Note: This webhook is only being used for SeAT - it is not re-used for any other process.

First image shows API return hitting the rate limit and indicates that it is not the global limit - hence why I believe this is related to a per-route limit: image

This image shows behavior once the job fails enough times, rendering the job permanently failed (no notification): image

This image shows a number of failures around the same timeframe, two are pictured above. (Note: Items with "3" attempts are 429 Too Many Requests failures, items with "4" attempts are MaxAttemptsExceededException. image

Version Info: SeAT v5, all modules up-to-date as of 2024-07-31.