Forever-Young / mrab-regex-hg

Automatically exported from code.google.com/p/mrab-regex-hg
0 stars 0 forks source link

* operator not working correctly with sub() #106

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
>>> regex.sub('.*', 'x', 'test')
u'xx' <--- This is wrong

>>> regex.sub('.+', 'x', 'test')
u'x'

>>> re.sub('.*', 'x', 'test')
u'x' <--- This is correct

>>> regex.sub('.*?', '|', 'test')
u'|||||||||' <--- This is wrong

>>> re.sub('.*?', '|', 'test')
u'|t|e|s|t|' <--- This is correct

python 2.7 64-bit linux, compiled from source regex version 2.4.39

Original issue reported on code.google.com by adse...@calibre-ebook.com on 30 Jan 2014 at 1:28

GoogleCodeExporter commented 9 years ago
How it should behave is a bit of a grey area.

The re module says 'x' and '|t|e|s|t|'.

Perl and PCRE says 'xx' and '|||||||||'.

This is because .* and .*? can/could match 0 characters after matching the >0 
characters, and there are cases where the re module definitely gets it wrong, 
so it's not clear whether the re module is getting it right here.

Original comment by re...@mrabarnett.plus.com on 30 Jan 2014 at 3:03

GoogleCodeExporter commented 9 years ago
Hmm well I dont really have an opinion. The behavior of re seems intuitively 
correct to me, but then that may just be because I have been using re for 
years. 

I just thought I'd report the discrepancy, as one of the goals of regex is (as 
I understand it) to replace re as seamlessly as possible.

Original comment by adse...@calibre-ebook.com on 30 Jan 2014 at 3:09

GoogleCodeExporter commented 9 years ago
Fixed in regex 2014.01.30.

It now behaves more like the re module in the version 0 behaviour.

Original comment by re...@mrabarnett.plus.com on 30 Jan 2014 at 9:36