Closed zverok closed 4 years ago
the argument about usability and prevalence in other engines looks really convincing to me.
regexp_parser seems to be used more and more on regular expressions encountered "in the wild", and there is no good workaround for these cases there.
as far as rubocop goes, you might catch any PrematureEndError
and then skip the affected cop, but this is obviously not very appealing.
generally speaking, the current limitation might push people to either not use regexp_parser for such cases, or to attempt to pre-escape their input with their own regexp-scanning implementation, or catch the error and unexpectedly do nothing.
so i'm all on board. any opposition, @ammar?
if there is no documentation it might be best to look at the onigmo code for details about the behavior.
some things to keep in mind / investigate:
/\#{}/
(currently a NoMethodError)/{/
, /]/
etc.(
and )
always seem to be an errorThanks for the mention @jaynetics.
I also think that is convincing for some cases. Also, for ruby at least, the supported variants of regexp have stabilized (for example: https://bugs.ruby-lang.org/issues/8133#note-5)
I think the following cases are reasonable:
{
}
]
(leading){}
(empty)I have doubts about other empty or unbalanced cases, like ()
.
I can take a stab at it this coming weekend.
balanced curlies with content that doesn't match \d+(,\d*)?
are also consistently treated as literals across rubies and other languages:
/a{2, 3}/ =~ 'a{2, 3}' # => 0
/a{2,3,4}/ =~ 'a{2,3,4}' # => 0
empty ()
also seem to be widely supported and treated as group that always matches. there are some use cases for that, too - breaking runs of other elements such as backref numbers, or achieving a desired numbering for following captures.
Treating a {...}
that don't match \d+(,\d*)?
as literals makes sense.
I haven't encountered that usage of ()
. If it emits an empty group, instead of a literal, then that makes sense.
My understanding of the concerns of the parser has evolved over time. I no longer see "validation" as one of them. These cases enforce that understanding.
Hi, and thanks for the awesome gem!
Recently regexp_parser started to be used in Rubocop to check regexp redundancy.
That led to uncovering of what can be considered as a bug (rubocop bug: https://github.com/rubocop-hq/rubocop/issues/8083). When parsing regexps like this, for example:
/{.+}/
(which is valid Ruby regexp), regexp_parser fails (thinking that{}
is incorrect quantifier). Same is related to some other forms, like/]\[/
I found out that the matter was discussed at #15, with verdict being, that it
Actually, I believe that it is not "MRI quirk", but sane behavior of the Regexp parser, that some characters have special meaning only in context. The behavior about parsing
{something that is not a quantifier}
, and]
is consistent through:So, it seems that parser that fails on those cases becomes less useful than it might be.