axiak / pyre2

Python wrapper for RE2
BSD 3-Clause "New" or "Revised" License
295 stars 39 forks source link

pyre2 anchor '$' have non standard non re compatible behaviour #41

Open omerlaufer opened 8 years ago

omerlaufer commented 8 years ago

documentation of gnu regex and also re module says:

This operator can match the empty string either at the end of the string or before a newline character in the string

look at this simple example:

re.findall(r'abc$', 'bla bla abc\n')
['abc']

re2.findall(r'abc$',  'bla bla abc\n')
[]

Is this behaviour is intentionally?

andreasvc commented 8 years ago

Duplicate of #40.

I am curious if anyone knows a solution. RE2 doesn't provide the equivalent operator (\Z), cf. https://github.com/google/re2/wiki/Syntax

It's not enough to do \n?$ because that behaves differently in substitutions, and in this case, by including the newline in the match.

Incidentally, why do you care? I found this behavior of $ rather odd, can't see why it's desirable.

omerlaufer commented 8 years ago
  1. this is the standard behaviour.
  2. the goal of pyre2 is:

The stated goal of this module is to be a drop-in replacement for re. to be compatible with re.

I understand if there are features that not supported, either because its fundamentally can't be done by re2 engine or just because its didn't implemented yet. But if pyre2 behaves differently for the same regex, its not aligned with pyre2 goal.