Closed astei closed 3 years ago
Thank you for the PR!
The current design with two random delays was fully intentional and hitting ratelimits more often is an accepted compromise.
Let me start by explaining why the random delay even exists. The initial logic was very simple: Wait 5 minutes after server startup and then send the data every 30 minutes. What I didn't consider back then, was that many servers tend to restart at a fixed time at a full hour mark (xx:00
). This caused a spike in requests every 30 minutes at xx:05
and xx:35
which resulted in a high CPU usage at these times:
The first idea to solve this issue was very similar to your PR: Just make the initial delay a bit random (e.g., MiniDigger suggested 5 min + random between 0 and 15 min). This however comes with a new set of issues:
This means that it is necessary to find a good balance and choose a not too short but also not too long random delay. Or alternatively, find another approach.
The current solution is a hybrid that tries to get the best of both worlds: It still uses a (short) initial random delay (between 3-6 minutes) but then instead of just sending the data every 30 minutes, it introduces a second delay between 0-30 minutes and only then starts the regular 30 minutes interval. This makes it very likely to hit a ratelimit once initially (in fact, in nearly 50% of all cases). Since ratelimits are super (CPU-)inexpensive for the backend to handle, this is no issue at all. However, it also makes sure that requests are evenly distributed and thus solves the issue quite nicely.
That being said, I agree that there is still space for improvements:
logFailedRequests
config option to see it (which is only intended for debugging), it indeed looks scary with its huge stacktrace and WARN
log level. It might be a good idea to handle it differently and log a less verbose message without a stack trace. Something along these lines: "Received 429 ratelimit from bStats. This is expected and not causing issues".Let me start by explaining why the random delay even exists.
I know why it exists - having made the Buycraft plugin, we ran into this issue too (and we had the advantage of doing near-infinite scaling on AWS!) and I implemented randomization to spread out the load from servers sending requests to pull purchases from Buycraft.
I'll probably implement your ideas here though, just to make the issue much less scary-looking.
Decided to not pursue this anymore - more pressing things to do.
Thanks anyway 🙂
I've created a new issue for it to not forget it: https://github.com/Bastian/bStats-Metrics/issues/86
See VelocityPowered/Velocity#476 for justification.