Closed Sytten closed 8 years ago
To avoid big spikes when buckets expire What causes the spike?
Is it because Redis doesn't like having multiple keys expiring at the same time? Or is it because we end up creating a bunch of new keys all at the same time?
Actually no, the problem is more that all buckets expire at the same time. So clients doing while(1) all start to smash on the platform at the same time. With the sliding window a client hammering our infra will never be able to reach the platform since he will always be blocked before as the bucket are never really reset totally, they just decrease over time.
Actually no, the problem is more that all buckets expire at the same time.
Wouldn't a simpler be solution be to add a small offset to the bucket, based on the hash of the key for example?
If I remember correctly we discussed this at the start of the project, and in the end we decided that it wasn't needed because you can protect against "spikes" using a limit on a short interval (ex: max X events / sec) combined with a limit on a larger bucket such as X events / hour.
I do think having a true sliding window does give a "smoother" behavior but I'm hard pressed in finding real use cases where it's really important. Thoughts?
@GuiSim It could help to do that. It was another proposed solution. @malaporte Yes, but I tested it and it didn't work that well. I didn't found a way to allow normal usage and still have a small enough limit so that one computer can exceed the limit. Plus @pastjean needed it so he might talk more about it.
We have a meeting this afternoon to see which implementation we really need and if we need both, how to integrate them better. I also agree that it is getting quite complex and it was not design for it...
After thought I put this feature on hold for now. I will do some bugfix and release a V1.0.1. If we ever need it we will have the code.
To avoid big spikes when buckets expire, I wrote a new backend with sliding window bucket. The idea is to divide the limits in fixed-size slices and add them to calculate the total of the window. I must admit that the framework wasn't designed for it and I had to do some tricks. Still I think it is robust and should handle our needs.
The only thing missing is performance. I see a couple of bottlenecks in the InMemorySlidingStorage specially for the computation of the sliding window. As for the network, I reused most the async batch backend. The only difference is that I limited the number of sync slices to avoid to much network load. I decided to only sync the current slice (if the slice time is greater than the sync time) or the last slices (if the slice time is less than the sync time).