NicolaasWeideman / RegexStaticAnalysis

A tool to perform static analysis on regexes to determine whether they are vulnerable to ReDoS.
MIT License
109 stars 28 forks source link

Project moved to maven and fixed deprecated dependencies #17

Closed symphony-enrico closed 4 years ago

davisjam commented 4 years ago

I certainly support moving to Maven over the current Makefile. I haven't tested this PR locally though.

Skimming over the commit, I think LGTM.

NicolaasWeideman commented 4 years ago

Thank you, I will take a look at this when I have a moment.

symphony-enrico commented 4 years ago

Thank you, I will take a look at this when I have a moment.

Thanks to you @NicolaasWeideman for this really interesting project 😄

We are considering if using your library for our project and I am performing some tests on it. If we decide to use it, we need to modify it to be called also as a library (not only from the command line) so I started to fork it and to work on it.

davisjam commented 4 years ago

@symphony-enrico Can you comment more on your intended use case?

symphony-enrico commented 4 years ago

@symphony-enrico Can you comment more on your intended use case?

The use case it is to validate in real time regex received in an API that is used to build programatically forms with validation. I tested another library, but your project is in some cases is much faster, so it looks to be ok for real time validation (<1 second). However, to do it, we need to add the possibility to be called also as a library and not only by the command line; also, the response returned must be readable by the caller software (so not only printed in the console). I already do some modifications for doing some tests, but I don't have created a pull request because it is only a draft for now

davisjam commented 4 years ago

The use case it is to validate in real time regex received in an API that is used to build programatically forms with validation.

Cool!

I also note that this tool does not detect all forms of super-linear behavior. In particular, both backreferences and lookaround assertions can yield exponential behavior. They are not supported by this tool. Are you able to perform regex evaluations under a timeout, similar to the .NET timeout system?

symphony-enrico commented 4 years ago

The use case it is to validate in real time regex received in an API that is used to build programatically forms with validation.

Cool!

  • For safety and performance, you might be interested in some of the ideas in this paper -- search for "regex variants".
  • The relevant variant code is in vuln-regex-detector. It's not bulletproof by any stretch of the imagination, a regex rewriting pass during parsing would be a better bet.

I also note that this tool does not detect all forms of super-linear behavior. In particular, both >backreferences and lookaround assertions can yield exponential behavior. They are not >supported by this tool. Are you able to perform regex evaluations under a timeout, similar to the >.NET timeout system?

Thanks very much for the hints, I will take a look.

In fact the idea is to have a first barrier: the API will receive a template of a form (with some regex to validate the form), but the API itself don't parse/use the regex or the form. It only do the first verification, to respond with an error if the template received contains a potentially dangerous regex. If accepted, the received template will be used later and as second barrier the regex will be re-validated at runtime (by another library) and also run with a timeout if possible.

davisjam commented 4 years ago

The multi-barrier scheme seems reasonable.

You may also want to think about the degree of vulnerability.

Depending on what valid input looks like, some regexes may be more concerning than others. See the "Trim" fix strategy in Table 5 of this paper.