Closed jaynetics closed 6 years ago
Turns out I should have read the docs... https://github.com/k-takata/Onigmo/blob/79114095/doc/RE#L155-L156
Intersections apply to all expressions in their set, not just adjacent ones.
'abc1'.scan(/[a b \d && b c [:digit:]]/x) # => ["b", "1"]
'abc1'.scan(/[^a b \d && b c [:digit:]]/x) # => ["a", "c"]
So maybe Intersection parse results need to look somewhat like this:
RP.parse(/[a&&b]/).first.first # =>
#<Intersection @expressions=[
#<Intersection::Left @expressions=[
#<Literal @text="a"/>
],
#<Intersection::Right @expressions=[
#<Literal @text="b"/>
]/>
]/>
Now that would require quite a bit of tree restructuring while parsing.
Not to mention that there can be more than one intersection:
'abc1&'.scan(/[abc && ab && bc]/x) # => ["b"]
Another option could be to treat Sets as group of Sequences
by default, which, however, might make them harder to handle just for this somewhat exotic feature.
Hmmm ...
I'm reasonably happy with this now ...
This makes
CharacterSet
a standardSubexpression
as suggested in https://github.com/ammar/regexp_parser/issues/47#issue-275073366All equivalent tokens result in the same
Scanner
andParser
emissions as outside of sets.New
CharacterSet::Range
andCharacterSet::Intersection
expressions represent respective trees.Other notable changes are:
@ammar What do you think? The commit messages provide a bit more explanation if you are wondering about some of the changes, but feel free to suggest any other solution.