Forever-Young / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 0 forks source link

slash handling in presence of a quantifier #83

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Hi,
I just noticed some strange maching behaviour with regard to slash /, while re 
seems to work as expected.
cf.:

What steps will reproduce the problem?
>>> regex.findall(r"c..+/c", "cA/c\ncAb/c")
['cA/c', 'cAb/c']

What is the expected output? What do you see instead?
cf.:
>>> re.findall(r"c..+/c", "cA/c\ncAb/c")
['cAb/c']

I first noticed this on html-like tags, but actually only the slash and 
(possibly?) some quantifier in the pattern seem to matter for this bug

>>> regex.findall(r"<c>..+</c>", "<c>A</c>\n<c>Ab</c>")
['<c>A</c>', '<c>Ab</c>']
>>> re.findall(r"<c>..+</c>", "<c>A</c>\n<c>Ab</c>")
['<c>Ab</c>']
>>> 

the equivalent quantifier behaves the same:
>>> regex.findall(r"<c>.{2,}</c>", "<c>A</c>\n<c>Ab</c>")
['<c>A</c>', '<c>Ab</c>']

What version of the product are you using? On what operating system?

regex-0.1.20121120; Python 2.7.3, win 7

Please provide any additional information below.

-

regards,
  vbr

Original issue reported on code.google.com by Vlastimil.Brom@gmail.com on 16 Dec 2012 at 8:23

GoogleCodeExporter commented 9 years ago
Fixed in regex 0.1.20121216.

The bug was to do with a repeated character, as in ".+". It didn't matter that 
there was a slash; it could happen with other characters too.

Original comment by re...@mrabarnett.plus.com on 16 Dec 2012 at 9:13

GoogleCodeExporter commented 9 years ago
Thank you very much for the quick update;
I most likely misinterpreted some pattern properties, which turned out to be 
irrelevant. I wasn't quite able to narrow down the problematic pattern, as e.g.
>>> regex.findall(r"c..+x", "cAx\ncAbx")
['cAbx']
and others without slash worked ok already;
Anyway, I am glad, it is fixed now.
thanks,
 vbr

Original comment by Vlastimil.Brom@gmail.com on 16 Dec 2012 at 11:02

GoogleCodeExporter commented 9 years ago
You changed two things there: you replaced the slash with "x", and you made it 
shorter by removing the following "c". The regex module combines two or more 
literal characters into a string, so you might get a different behaviour when 
it comes to bugs.

An example of it not working even with the slash changed would have been:

regex.findall(r"c..+xc", "cAxc\ncAbxc")

Original comment by re...@mrabarnett.plus.com on 16 Dec 2012 at 11:28

GoogleCodeExporter commented 9 years ago
Ok,
thanks for the explanation; 
         vbr

Original comment by Vlastimil.Brom@gmail.com on 16 Dec 2012 at 11:43