HelloZeroNet / ZeroNet

ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network
https://zeronet.io
Other
18.39k stars 2.27k forks source link

SafeRe is vulnerable to ReDoS #2757

Open gqgs opened 3 years ago

gqgs commented 3 years ago

Step 1: Please describe your environment

Step 2: Describe the problem:

"To avoid the ReDoS algorithmic complexity attack" the function bellow is used to validate user defined regular expressions.

https://github.com/HelloZeroNet/ZeroNet/blob/454c0b2e7e000fda7000cba49027541fbf327b96/src/util/SafeRe.py#L10-L22

This function fails to identify regular expressions that can require exponential time complexity to match user inputs.

Steps to reproduce:

>>> from SafeRe import isSafePattern, match
>>> p = "a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
>>> isSafePattern(p)
True
>>> match(p, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa")

Observed Results:

match hangs and the execution never completes.

Expected Results:

isSafePattern should properly detect that the pattern is unsafe. Alternatively, match should use an algorithm with guaranteed linear time complexity to compile and match inputs (e.g. Thompson NFA).

rllola commented 3 years ago

We could replace this by the RE2 (https://github.com/google/re2). There is python bindings available (https://pypi.org/project/google-re2/).

wandrien commented 3 years ago

@rllola

Many zites make use of (?!...) and RE2 doesn't seem to support it. (https://github.com/google/re2/wiki/Syntax) The problem is we neither check for formal allowed regexp syntax, nor have the formal definition at all. Our regexp syntax is implicitly python re syntax.

Not sure if it is possible to move to RE2 in a backward compatible way.

wandrien commented 3 years ago

https://github.com/zeronet-enhanced/ZeroNet/commit/2a25d61b968a21aa98c6db2ca9d64f1bbdc54773

In my fork, I (temporarily) fixed this by treating ?s in the same ways as other "repetitions", so the total number of repetition markers cannot exceed 9.

Not sure if it is a proper or a complete solution. I'm not familiar with the ReDoS type of attack and regexp implementation details.