AdamPflug / express-brute

Brute-force protection middleware for express routes by rate limiting incoming requests
MIT License
564 stars 91 forks source link

global bruteforce count is not updating on more than 1000 concurrent requests #46

Open subhamAggarwal opened 8 years ago

subhamAggarwal commented 8 years ago

On using express brute package and applying a global limit, the count is not updated properly in express brute store. I’ve tried both mongodb and redis db as brute force stores but still the same results are observed. I tried doing the same using JMeter and made a load request of 1000 concurrent requests to my API. Once all the 1000 requests are completed, the count in the express brute store doesn’t gets increased to more than 150 ever. The same thing if I try with 100/200 concurrent requests, the count is updated correct in the redis db. So can anyone confirm whether they also get similar results with such a use case ?

AdamPflug commented 8 years ago

Looks like a bug resulting from a race condition when you have multiple concurrent requests. Both processes get the current counter value (e.g. 1), and then both processes attempt to set the counter to one more (e.g set the counter to 2). Both operations succeed, but the counter doesn't accurately reflect both requests (is 2, should be 3).

It'll take some investigation to fix this. For one thing, it's something that will need to be addressed on a store-by-store basis because underlying stores will have different ways to handle the concurrency.

As an example, right now the redis store is storing the state as a JSON blob in a single key (per IP if applicable). To fix this problem we'd need to break the count out of that into its own key so we can use INCR instead of SET to avoid the lost updates. I don't think we need to worry about multiple updates overwriting each other as much with the other data (last request time, first request time) because the values will be close enough to each other that the behavior should be pretty much right. This still has concurrency issues (multiple requests can go in "at the buzzer" and they'll all succeed), but it seems like the best approach given the performance costs of trying to serialize all the get/sets.

jbrwn commented 7 years ago

Looks like a bug resulting from a race condition

I was evaluating this library and noticed this pretty quickly. It seems like you are asking for trouble doing a get/set in code, no? I think a better approach would be to offload this to the data store and let it do its thing in one ATOMIC operation.

AdamPflug commented 7 years ago

Agreed, the trick is that makes it cache engine specific code so it would need to be fixed in each storage adapter individually... To me going to all the effort of standardizing the cache interface and tying it specifically to express-brute seems to be an issue here. I think we need something like waterline for cache backends, but I haven't found a good option. Considering starting a project but that makes this a larger effort...

eyelidlessness commented 6 years ago

So, it seems this hasn't been addressed for some time. Is this still an issue? If this library is doing a get/set, it is guaranteeing that it will fail to prevent an attack. This should be top priority for any rate limiting library.

jerroydmoore commented 5 years ago

FYI. npm audit is now flagging this as a High vulnerability. Our team discovered this running our production deployment checklist today. I expect you'll be hearing a lot of +1s in the future...

https://www.npmjs.com/advisories/823

AdamPflug commented 5 years ago

I should have some time to address this in early June. Note that because of the nature of the issue it'll likely require a breaking change (v2.0), which will require either new or updated storage adapters.

As always I welcome pull requests (or suggestions on existing storage adapter code I could leverage to avoid reinventing the wheel) if people are looking to fast track the process (since creating the API and implementing that project it the biggest blocker to fixing this).

jerroydmoore commented 5 years ago

A breaking change in what sense? The interface or the data store?

AdamPflug commented 5 years ago

@jerroydmoore long story short, I'm currently thinking it will only affect the data stores, but that will mean you'll need to update your data stores if you want to migrate to version 2.

More context on my version 2.0 plans here are in #83

AdamPflug commented 5 years ago

Hey all, I wanted to stop in here with an update and to summarize my thoughts on this really quickly, as well as why it's taking longer than expected to fix.

First of all, I want to clarify that I don't believe this is accurately characterized as a "high severity" security vulnerability. While the main purpose of the library is to mitigate the effects of brute force attacks, and thus limitations of the library do have a security impact, it doesn't cause any of the issues you'd traditionally associate with vulnerabilities (especially severe ones): crashes, denial of service, information leakage, arbitrary code execution, privilege escalation, etc. Really this issue just means that an attacker can sneak in some extra requests, obviously this isn't ideal and we should do the best we can to prevent that, but if an attacker can try 50 passwords instead of 10 they're really not that much more likely to be successful (and certainly you're still much better off with express-brute than without it, and most of the alternatives aren't much better). I do understand that for some use cases, such as PCI compliance, hitting an exact deterministic number of maximum requests before a lockout is important though.

All that said, I agree this is something that needs to be fixed and want to explain why it's not quite as simple as "just use an atomic increments". The fundamental problem is that concurrent requests are going to be an issue regardless of whether we do atomic increments or not. For example, consider this express-brute configuration:

new ExpressBrute(store, {
    freeRetries: 0,
    minWait: 5*60*1000, // 5 minutes
    maxWait: 15*60*1000 // 15 minutes
});

If an attacker issues a bunch of simultaneous requests to an endpoint protected by this instance, more than one will likely get through, even if we're using atomic increments, because the increment doesn't actually happen until after the initial checks. It's worth noting here that these attacks scale based on application hosting infrastructure more than the attacker's resources - that is to say that the more threads running your application, the greater the latency between the store and the application servers, and the more load on your store servers, the more susceptible you are to these attacks. It's also worth noting that this doesn't affect MemoryStore, because that's synchronous. When free retries is 0 (or a user hits the limit of free retries) then atomic increments don't help prevent extra requests on their own.

The other interesting thing about the above scenario is the end-state. If we have atomic increments in place then we now have a counter of, say, 5 requests which were executed simultaneously, much faster than should have been allowed. When should the 6th request be allowed? A: in 5 minutes (minWait is usually the length of the first lockout)? B: in 60 minutes (the minimum time it would take to make 6 requests assuming no extra requests were snuck in)? C: in 15 minutes (the value of maxWait)? I can see arguments for all of the above. For what it's worth right now the behavior of express-brute with free-retries set to 0 is essentially A, and even with the default freeRetries setting of 2 the real-world behavior is pretty close to that. Also note that every time a new request is allowed we're vulnerable to multiple simultaneous requests again.

So, we need to prevent multiple requests from being allowed in the first place. There are a couple of ways we can handle simultaneous requests:

I think the 2nd option is much more realistic, but it still needs a unique implementation for each store, and depending on the store the logic can be a little tricky. Note that that depending on the store we still may still have multiple processes fighting to set the ttl and last valid request time, but since they're all setting them to the same (or at least close enough to the same) values this should be fine. This is also all a bit chattier with the stores than our current implementation, so there is unfortunately some performance overhead associated with all this (we move from 1-2 store requests per middleware execution, to 2-7 per execution for memcached for example, though stores that support multiple values per key like redis or stores with conditional updates like dynamodb or sql will not take as big of a hit). We might also be able to reduce some of the effect of this additional chattiness by using an in-memory LRU cache to limit the number of requests going to the stores from each process - though that comes with its own complexities.

Anyway, in addition to all of this I've also been dealing with some illnesses in the family (my wife, dad, step mom, and mother in law are all currently in treatment), and my daughter has had some daycare closures and stuff that reduced the time I had to work on this (I'm not being paid to work on it so I need to fit it into my free time). So that's why things are taking a little longer than intended, but it's still something I'm actively working on and I hope to get it all fixed with #83 soon.

jerroydmoore commented 5 years ago

Thanks for the investigation @AdamPflug! I agree, that since the vulnerability isn't measured in orders of magnitude, a high vulnerability might be overstated. Are you aware of an appeals process with npm to downgrade the vulnerability?

I think folks will be better served sticking with your express-brute package rather than switching to rate-limiter-flexible by @animir. I tried migrating, but it was more work than it was worth. Additionally, I tried to find more information about @animir, since these packages are used for security reasons, and, unfortunately, there isn't much to find.

Edit: Sorry to hear about your family. Of course, family comes first. If you have Contributing Guidelines and a list of tasks rated by effort, new contributors could help you.

huntr-helper commented 4 years ago

👋 Hey folks! We've recently opened a bug bounty against this issue, so if you want to get rewarded 💰 for fixing this vulnerability 🕷, head over to https://huntr.dev!

animir commented 4 years ago

@huntr-helper There is already solution for this, use ExpressBruteFlexible instead.