ammar / regexp_parser

A regular expression parser library for Ruby
MIT License
144 stars 23 forks source link

`regexp_parser` rejects `/\xA/` but MRI accepts it #75

Closed dgollahon closed 3 years ago

dgollahon commented 3 years ago

Hi,

I am working on re-introducing regexp mutation support on mutant and I noticed that since the old integration existed regexp_parser seems to have decided to stop rejecting a large % of regexps that ruby would accept (https://github.com/ammar/regexp_parser/issues/63) but regexp_parser did not. I did find one additional case that was not documented anywhere I found (I tried brute-forcing millions of regexps to infer if there were any cases where regexp_parser was stricter than MRI and this is the only class of instances I could find).

"\xA" # => "\n"
/\xA/.match?("\n") # => true

 Regexp::Parser.parse(/\xA/) # => Regexp::Scanner::PrematureEndError: Premature end of pattern at \x

Is this a bug or intended behavior? Either is fine for my purposes since I can just add a special check to ignore errors in this case, but I was curious if this was an intended difference or not. The coverage matrix in the README suggests that hex escapes work but I guess this is a special case that was not highlighted. If it is intentional behavior, it would be helpful to document it (unless I missed where this was done already) or alternatively having parity with MRI would work for me.

Thanks!

jaynetics commented 3 years ago

@dgollahon thanks for the report, and for going to such lengths to check all kinds of regexps! ❤️

This one was also clearly a bug. We had code to handle hex escapes with just one xdigit since the start, it's just been unreachable for all these 10 years 😄

The fix is included in v2.0.1.

dgollahon commented 3 years ago

Fantastic! Thanks for the excellent response @jaynetics. :D