Closed jaynetics closed 6 years ago
@janosch-x That's an interesting problem. I can see different interpretations, depending on which layer is being considered (scanner, lexer, or parser), and the use case.
For me, so far at least, a Token
is a mid level representation, that does not capture the semantics of execution for the entire expression, it only captures the semantics of definition for individual expressions.
On the other hand, I can see an Expression
as a representation of the semantics of execution, not that I ever meant for them to be executable.
My initial take might be colored by the fact that all the examples you gave are for Expression
. So, I'm wondering if Expressions
could 'ask' the expression that contains them what options are 'in-effect', if they don't have that answer themselves.
I think there's a bug at play too. I would have expected the following (without .expressions[0]
) to return true:
P.parse(/a/i).i?
The last example is weird :), but I agree. I expected false
.
The use case is as follows:
I want to convert Ruby regexes to allow them to work on simpler regex engines, e.g. JavaScript. To substitute option groups, I have conversions such as the following in mind:
fancy_ruby = /ab(?i:cd[ef])/
simplified = /ab(?:[cC][dD][eEfF])/
This is hardly an essential feature, so I would not want to worsen the clarity or technical debt of regexp_parser to achieve it.
For me, so far at least, a Token is a mid level representation, that does not capture the semantics of execution for the entire expression, it only captures the semantics of definition for individual expressions.
I might not have fully understood this distinction, but don't Tokens already have some attributes that cross the boundary from individual definition to execution terms, e.g. set_level
?
So, I'm wondering if Expressions could 'ask' the expression that contains them what options are 'in-effect', if they don't have that answer themselves.
That was my initial thought as well. The problem is that not only the containing expression is relevant, but also potentially all proceeding expressions. E.g.
/ (a (?i) b (c)) (?i: d (?-i) e) /x
# b, c and d are case-insensitive, a and e are not
That's why I thought that making "currently active options" a "state" of one of the three layers and setting them for every output object might make sense.
Thanks. I understand the intention now.
I might not have fully understood this distinction, but don't Tokens already have some attributes that cross the boundary from individual definition to execution terms, e.g. set_level?
I don't see set_level?
as serving execution per se. I see it serving the parser. I might be splitting hairs here though, especially since execution was never a goal.
Your suggestion makes perfect sense. It supports the main use case for regexp_parser: transformations. Also it doesn't seem like it would cause any harm to add significant complexity.
Do you it will require a minor version bump to 0.5.0? And is there anything I can help with?
Ok, you've convinced me to add it to the Parser instead of the Lexer...
Do you think it will require a minor version bump to 0.5.0?
Imho a patch version bump will do, as the old behavior was kind of counter-intuitive and it would be strange for anyone to rely on it.
Also, I did without adding a new token type :options_switch
yet. We can make that potentially breaking change and remove some redundancy in v1.0.0.
And is there anything I can help with?
Take a look at the PR :)
Right now, it is hard to tell which options apply to a specific expression when passing in a ::Regexp with flags or when there are groups with option flags.
E.g.
To correctly determine wether
a
is case-insensitive, a user would have to keep track of the root options and any "currently active" option groups himself.@ammar if you agree, I think we could improve this by setting a correct
options
hash for every Token, as follows:Token
class,:options
, in token.rb:11Parser::options
and move all option processing to theLexer
instead. imho currently active options are similar togroup_level
orset_level
.@options
based oninput
in the Lexer (use an empty Hash ifinput
is a ::String instead of a ::Regexp)@options
whenever encountering a group with option flags[:group, :options, ...]
into two tokens:[:group, :options_local, ...]
and[:group, :options_switch, ...]
, to allow differentiating between group-local option modifications (e.g./a(?i:b)c/
) and those that persist after the group is closed (e.g./a(?i)bc/
)@options
to every new Token created in lexer.rb:30(@options and @options[:m]) ? true : false
to!!@options[:m]