I wasn't able to pinpoint exactly what in the sample text was triggering the backtracking but the longer the text is the exponentially longer the url extraction would take. The sample text above is 3.7k and takes <10 seconds, the original text I was having issues with was ~26k and extraction took almost 3 minutes.
This fixes a Catastrophic Backtracking issue with
BACKSLASH_URL_RE
by updating the regex to match the format used by the bracket regex.All current tests pass before and after the change.
Proof of concept script
Before fix
After fix
I wasn't able to pinpoint exactly what in the sample text was triggering the backtracking but the longer the text is the exponentially longer the url extraction would take. The sample text above is 3.7k and takes <10 seconds, the original text I was having issues with was ~26k and extraction took almost 3 minutes.
Fixes #52