Forever-Young / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 0 forks source link

Performance regression somewhere between 0.1.20110315 and 0.1.20120613 #70

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The following code

tokens = deque()
pattern = regex.compile("^(content-type:.+(\n\s+.+)?)|--(.*)", re.I|re.M)

for m in pattern.finditer(body):
    tokens.append((m.group(0), m.start(), m.end()))

Runs 8 ms on 0.1.20110315
and 90 ms on 0.1.20120613

On 1000 iterations it's even slower (200x slower)

Here's the string I've been using for tests

ftp://ftp.cac.washington.edu/mail/mime-examples/torture-test.mbox

Original issue reported on code.google.com by klizhen...@gmail.com on 2 Jul 2012 at 12:23

GoogleCodeExporter commented 9 years ago
There have been bug fixes related to repetition between those two versions. 
Although regex 0.1.20110315 gives the correct results for this pattern and 
text, it gives incorrect results for certain other patterns and texts.

For this text I find that regex 0.1.20120613 is between 1.5 and 2 times faster 
than re with Python 3.2, gives the same results as re whether matching Unicode 
or ASCII, and that the times scale as expected for the number of iterations, so 
the time per iteration remains about the same.

Original comment by re...@mrabarnett.plus.com on 2 Jul 2012 at 4:00

GoogleCodeExporter commented 9 years ago
You might be interested to know that performance improvements in regex 
0.1.20121031 have now made this particular test about 15 times faster than re 
with Python 3.2.

Original comment by re...@mrabarnett.plus.com on 31 Oct 2012 at 3:48