davisjam / vuln-regex-detector

Detect vulnerable regexes in your project. REDOS, catastrophic backtracking.
MIT License
316 stars 27 forks source link

COULD-NOT-PARSE evilInput for 'wuestholz-RegexCheck' detector #68

Closed ColdFire87 closed 4 years ago

ColdFire87 commented 4 years ago

Hi,

I tried running the detector using the npm module but I got back an INVALID message:

const vulnRegexDetector = require('vuln-regex-detector');

const regex = /^Bearer [a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?\.[a-zA-Z0-9\-_]+?$/; // RegExp
const pattern = regex.source; // String

const config = {
    cache: {
        type: vulnRegexDetector.cacheTypes.persistent,
    }
};

vulnRegexDetector
    .test(pattern, config)
    .then((result) => {
        if (result === vulnRegexDetector.responses.vulnerable) {
            console.log('Regex is vulnerable');
        } else if (result === vulnRegexDetector.responses.safe) {
            console.log('Regex is safe');
        } else {
            console.log('Not sure if regex is safe or not');
        }
    });
(node:38788) UnhandledPromiseRejectionWarning: INVALID
(node:38788) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (r
ejection id: 1)
(node:38788) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

So I used Docker instead to build the image locally, then run it against a regex pattern for validating a JWT.

Here is the input file:

{"pattern": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$", "validateVuln_language": "javascript", "validateVuln_nPumps": 100000, "validateVuln_timeLimit": 2, "useCache": 0}

Here is part of the console output:

Checking wuestholz-RegexCheck for timeout-triggering evil input
wuestholz-RegexCheck: the regex may be vulnerable (isVariant 1)
  wuestholz-RegexCheck: Could not parse the evil input

And here is the generated report:

{
  "detectReport": {
    "timeLimit": "60",
    "memoryLimit": "8192",
    "pattern": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$",
    "detectorOpinions": [
      {
        "patternVariant": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$",
        "name": "rathnayake-rxxr2",
        "opinion": {
          "isSafe": 1,
          "canAnalyze": 1
        },
        "hasOpinion": 1,
        "secToDecide": "0.0411"
      },
      {
        "secToDecide": "0.7431",
        "hasOpinion": 1,
        "opinion": {
          "canAnalyze": 1,
          "isSafe": 1
        },
        "name": "weideman-RegexStaticAnalysis",
        "patternVariant": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$"
      },
      {
        "hasOpinion": 1,
        "secToDecide": "0.6552",
        "patternVariant": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$",
        "name": "wuestholz-RegexCheck",
        "opinion": {
          "isSafe": 0,
          "evilInput": [
            "COULD-NOT-PARSE"
          ],
          "canAnalyze": 1
        }
      },
      {
        "opinion": {
          "canAnalyze": true,
          "isSafe": true,
          "evilInput": []
        },
        "patternVariant": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$",
        "name": "shen-ReScue",
        "secToDecide": "46.3193",
        "hasOpinion": 1
      }
    ]
  },
  "isVulnerable": 0,
  "pattern": "^Bearer [a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?\\.[a-zA-Z0-9\\-_]+?$"
}

I'm a bit confused. Is the regex vulnerable but the wuestholz-RegexCheck detector couldn't provide an example of an evil input? Is the regex safe?

Please help! Thanks!

And thank you for creating this project!

davisjam commented 4 years ago

Answering your questions

INVALID

My lab machine has been repurposed so I don't think the web server is working.

So I used Docker instead to build the image locally, then run it against a regex pattern for validating a JWT.

Glad this worked!

Is the regex vulnerable but the wuestholz-RegexCheck detector couldn't provide an example of an evil input?

Wuestholz thinks it's vulnerable, but my tool couldn't parse Wuestholz's output.

Is the regex safe?

Yes. This is a false positive from Wuestholz. (Weideman is more trustworthy in my experience.)

Why is the regex safe?

Your regex has no ambiguity -- at each point in the pattern, when the regex engine encounters a character it has no choice about how to parse it.

Related regex that is unsafe

If your custom character classes included a . as one of the valid characters, your regex would have ambiguity. While in one of the char classes with a ., when the regex engine saw a . it would have a choice about whether to consume it in the char class or to advance to the adjacent char class. This would give your regex polynomial worst-case time complexity on malicious input.

Here's an example of such a regex:

{"pattern": "^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$", "validateVuln_language": "javascript", "validateVuln_nPumps": 100000, "validateVuln_timeLimit": 2, "useCache": 0}

(Same as your original regex, but I added a . to each character class).

(12:36:46) jamie@woody ~/Desktop/floss/vuln-regex-detector $ ./bin/check-regex.pl /tmp/reg.json
Config says to use the cache
Config says useCache 1
Query says I should not use the cache
Using default for detectVuln_memoryLimit: 8192
Using default for detectVuln_timeLimit: 60
Querying detectors
/home/jamie/Desktop/floss/vuln-regex-detector/src/detect/detect-vuln.pl /tmp/check-regex-12252.json 2>>/tmp/check-regex-12252-progress.log
Detectors said: {"pattern":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","detectorOpinions":[{"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"canAnalyze":0,"isSafe":"UNKNOWN"},"hasOpinion":1,"secToDecide":"0.0211","name":"rathnayake-rxxr2"},{"hasOpinion":1,"secToDecide":"0.6453","name":"weideman-RegexStaticAnalysis","patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"isSafe":0,"canAnalyze":1,"evilInput":[{"suffix":"B ","pumpPairs":[{"pump":".a","prefix":"Bearer a"},{"prefix":"a","pump":".a"}]}],"predictedComplexity":"polynomial"}},{"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"isSafe":0,"canAnalyze":1,"evilInput":["COULD-NOT-PARSE"]},"hasOpinion":1,"secToDecide":"0.3439","name":"wuestholz-RegexCheck"},{"opinion":{"evilInput":[],"isSafe":true,"canAnalyze":true},"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","name":"shen-ReScue","hasOpinion":1,"secToDecide":"19.0108"}],"timeLimit":"60","memoryLimit":"8192"}
Checking rathnayake-rxxr2 for timeout-triggering evil input
  rathnayake-rxxr2: says not vulnerable
Checking weideman-RegexStaticAnalysis for timeout-triggering evil input
weideman-RegexStaticAnalysis: the regex may be vulnerable (isVariant 1)
  weideman-RegexStaticAnalysis: Validating the evil input (query: {"language":"javascript","timeLimit":2,"nPumps":100000,"pattern":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","evilInput":{"suffix":"B ","pumpPairs":[{"pump":".a","prefix":"Bearer a"},{"prefix":"a","pump":".a"}]}})
/home/jamie/Desktop/floss/vuln-regex-detector/src/validate/validate-vuln.pl /tmp/check-regex-12252.json 2>>/tmp/check-regex-12252-progress.log
  weideman-RegexStaticAnalysis: evil input triggered a regex timeout
{"pattern":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","validateReport":{"timedOut":1,"timeLimit":"2","language":"javascript","validPattern":1,"evilInput":{"suffix":"B ","pumpPairs":[{"pump":".a","prefix":"Bearer a"},{"pump":".a","prefix":"a"}]},"pattern":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","nPumps":100000},"detectReport":{"pattern":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","detectorOpinions":[{"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"canAnalyze":0,"isSafe":"UNKNOWN"},"hasOpinion":1,"secToDecide":"0.0211","name":"rathnayake-rxxr2"},{"hasOpinion":1,"secToDecide":"0.6453","name":"weideman-RegexStaticAnalysis","patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"isSafe":0,"canAnalyze":1,"evilInput":[{"suffix":"B ","pumpPairs":[{"pump":".a","prefix":"Bearer a"},{"prefix":"a","pump":".a"}]}],"predictedComplexity":"polynomial"}},{"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","opinion":{"isSafe":0,"canAnalyze":1,"evilInput":["COULD-NOT-PARSE"]},"hasOpinion":1,"secToDecide":"0.3439","name":"wuestholz-RegexCheck"},{"opinion":{"evilInput":[],"isSafe":true,"canAnalyze":true},"patternVariant":"^Bearer [.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?\\.[.a-zA-Z0-9\\-_]+?$","name":"shen-ReScue","hasOpinion":1,"secToDecide":"19.0108"}],"timeLimit":"60","memoryLimit":"8192"},"isVulnerable":1}

Looking at that output, we can see:

Checking rathnayake-rxxr2 for timeout-triggering evil input
  rathnayake-rxxr2: says not vulnerable
Checking weideman-RegexStaticAnalysis for timeout-triggering evil input
weideman-RegexStaticAnalysis: the regex may be vulnerable (isVariant 1)
  weideman-RegexStaticAnalysis: Validating the evil input ...
  weideman-RegexStaticAnalysis: evil input triggered a regex timeout

which means that Weideman's tool found the regex vulnerable and proposed evil input. And using this input, Node.js took longer than your time threshold to perform the regex match.

The end

I'm closing this issue since I think I've answered your questions. Please re-open if you have more to say.

ColdFire87 commented 4 years ago

@davisjam Thanks for the thorough explanation! đź‘Ť