Forever-Young / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 0 forks source link

Infinite loop is found #127

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If you're not using the latest version, please try that.
Latest version (2014.11.03) is used. 

What steps will reproduce the problem?
1. Compile follow regular expression 

import regex
reg_exp = 
regex.compile("<title>(\s|\n)*USAA\s+Military\s+Home,\s+Life\s+&\s+Auto\s+Insura
nce\s+|\s+Banking\s+&\s+Investing")

2. Read into variable test file(see attach) (actually it is main page of 
http://cheap.nl)

line = open("cheap.nl.html", "r").read()

3. Run regular expression against this line

reg_exp.search(line)

What is the expected output? What do you see instead?

Method should be performed at reasonable amount of time. But currently it works
infinitely long eating 1 core of CPU. 

Which version of Python? 32-bit or 64-bit?

python-2.7.6, 64 bit. 

Which operating system? Big-endian or little-endian?

Alt Linux, little-endian.

Please provide any additional information below.

Original issue reported on code.google.com by Vladimir...@gmail.com on 12 Nov 2014 at 2:38

Attachments:

GoogleCodeExporter commented 9 years ago
It appears that regex is busy working on the large chunk (it's 317210 bytes 
long!) of whitespace in the file. (Who would create such a strange thing?)

As a temporary workaround, try compacting the whitespace with regex.sub(r"\s+", 
" ", text); it'll reduce the size of the text by nearly 90%!

I'll try to see whether a fix is possible.

Original comment by re...@mrabarnett.plus.com on 12 Nov 2014 at 7:23

GoogleCodeExporter commented 9 years ago
Fixed in regex 2014.11.13.

Original comment by re...@mrabarnett.plus.com on 13 Nov 2014 at 2:02