SecurityInnovation / PGPy

Pretty Good Privacy for Python
BSD 3-Clause "New" or "Revised" License
317 stars 99 forks source link

Catastrophic Backtracking Issue in ASCII Armor Regex #374

Open braedon opened 3 years ago

braedon commented 3 years ago

Armorable.__armor_regex.search() can run indefinitely on certain text. This appears to be a catastrophic backtracking issue.

For example, pgpy.PGPMessage.from_blob() will run for hours without completion on this signed security.txt file. I've created a gist that replicates the issue.

The file in question doesn't match the regex because it includes a space on each blank line, which the regex doesn't allow for the blank lines after the headers. The file uses \r\n line endings, which seems to be required for the issue - presumably due to interactions between the optional \rs and the various other optional clauses. Replace the line endings with just \n and the search fails quickly.

The issue can be mitigated by changing how the cleartext group detects the end of the cleartext block.

It currently matches arbitrary lines, and then uses a positive lookahead assertion to match a final line that's followed by a line starting with -----. I'm not a regex expert, but I think this is problematic as the cleartext group can match the whole rest of the text before the lookahead assertion (and other later clauses) makes it backtrack.

A negative lookahead assertion can instead be used to stop the cleartext group when it first encounters a line starting with -----.

Replace:

(?P<cleartext>(.*\r?\n)*(.*(?=\r?\n-{5})))(?:\r?\n)

With:

(?P<cleartext>(.*\r?\n(?!-{5}))*.*)(?:\r?\n)

i.e. match as many lines as you can that aren't followed by -----, and then match the final line (implicitly followed by -----).

This change reduces the search failure return time for this file to ~1s on my laptop - not great, but usable.

Note that I don't think the regex should fail to match this file at all - my reading of rfc4880 is that whitespace should be allowed in the blank lines after the headers - but I'll raise another issue for that.

Commod0re commented 3 years ago

we've had a lot more trouble with this particular regex than I expected when I implemented it, tbh. It mostly works, but, I think it causes more problems than it solves. You could submit a PR with that change, I definitely want to switch approaches to something faster and less finnicky in the long run though