Closed GoogleCodeExporter closed 9 years ago
I think it's probably catastrophic backtracking. I'll need to discover why the
existing preventative measures aren't managing to stop it in this case (it's
not possible to stop it altogether).
I believe that catastrophic backtracking is exponential in behaviour.
Original comment by re...@mrabarnett.plus.com
on 24 Jan 2013 at 2:27
In standard 're', the same call does terminate and takes only 4 us - so if this
is an issue of long computation, it means that 'regex' module is in some cases
less efficient than 're', right? which in turn means that 'regex' won't work as
a general replacement for 're', because some existing code may break.
I'm not an expert in parsers implementation, but this problem of backtracking
reminds me of a similar issue that occurs in PEG parsers
(http://en.wikipedia.org/wiki/Parsing_expression_grammar), where basic
implementation of backtracking has exponential worst-case complexity, but when
coupled with a technique of 'memoization' the complexity stays always *linear*
even for difficult cases, only at the cost of higher memory consumption (but
still at most linear in the total size of input). See:
http://en.wikipedia.org/wiki/Memoization
http://en.wikipedia.org/wiki/Parsing_expression_grammar#Implementing_parsers_fro
m_parsing_expression_grammars
I think PEGs are more general than regex'es, so the solution should be
applicable to regexes, too, although I can't say anything more specific.
Original comment by mwojn...@gmail.com
on 24 Jan 2013 at 11:30
Sorry for doubting you, it _is_ a bug. :-(
It'll be fixed in the next release.
Original comment by re...@mrabarnett.plus.com
on 24 Jan 2013 at 2:03
Fixed in regex 0.1.20130124.
Original comment by re...@mrabarnett.plus.com
on 24 Jan 2013 at 8:31
I've updated the package, but the bug is still present. In ipython:
>>> import regex
>>> regex
<module 'regex' from
'/usr/local/lib/python2.7/dist-packages/regex-0.1.20130124-py2.7-linux-i686.egg/
regex.pyc'>
(version 0.1.20130124 - newest one)
>>> regex.match(r'.*a.*ba.*aa', "ababba")
-- doesn't terminate after 2 hours
Original comment by mwojn...@gmail.com
on 25 Jan 2013 at 5:07
It got unfixed during the other work. I'm surprised you let it run for 2 hours!
Fixed in regex 0.1.20130125.
Original comment by re...@mrabarnett.plus.com
on 25 Jan 2013 at 8:52
Thanks, works for me now.
Original comment by mwojn...@gmail.com
on 26 Jan 2013 at 12:10
Original issue reported on code.google.com by
mwojn...@gmail.com
on 24 Jan 2013 at 12:20