ammar / regexp_parser

A regular expression parser library for Ruby
MIT License
143 stars 22 forks source link

Suggestion / TBD: improve options handling #43

Closed jaynetics closed 6 years ago

jaynetics commented 7 years ago

Right now, it is hard to tell which options apply to a specific expression when passing in a ::Regexp with flags or when there are groups with option flags.

E.g.

P = Regexp::Parser
P.parse(/a/i).expressions[0].i?                      # => false 
P.parse(/(a)/i).expressions[0].i?                    # => false 
P.parse(/x(?i)a/).expressions.last.i?                # => false 
P.parse(/(?i:(a))/).expressions[0].expressions[0].i? # => false
P.parse(/(?i-i:a)/).expressions[0].i?                # => true (should be false)

To correctly determine wether a is case-insensitive, a user would have to keep track of the root options and any "currently active" option groups himself.

@ammar if you agree, I think we could improve this by setting a correct options hash for every Token, as follows:

ammar commented 7 years ago

@janosch-x That's an interesting problem. I can see different interpretations, depending on which layer is being considered (scanner, lexer, or parser), and the use case.

For me, so far at least, a Token is a mid level representation, that does not capture the semantics of execution for the entire expression, it only captures the semantics of definition for individual expressions.

On the other hand, I can see an Expression as a representation of the semantics of execution, not that I ever meant for them to be executable.

My initial take might be colored by the fact that all the examples you gave are for Expression. So, I'm wondering if Expressions could 'ask' the expression that contains them what options are 'in-effect', if they don't have that answer themselves.

I think there's a bug at play too. I would have expected the following (without .expressions[0]) to return true:

P.parse(/a/i).i?

The last example is weird :), but I agree. I expected false.

jaynetics commented 7 years ago

The use case is as follows:

I want to convert Ruby regexes to allow them to work on simpler regex engines, e.g. JavaScript. To substitute option groups, I have conversions such as the following in mind:

fancy_ruby = /ab(?i:cd[ef])/
simplified = /ab(?:[cC][dD][eEfF])/

This is hardly an essential feature, so I would not want to worsen the clarity or technical debt of regexp_parser to achieve it.

For me, so far at least, a Token is a mid level representation, that does not capture the semantics of execution for the entire expression, it only captures the semantics of definition for individual expressions.

I might not have fully understood this distinction, but don't Tokens already have some attributes that cross the boundary from individual definition to execution terms, e.g. set_level?

So, I'm wondering if Expressions could 'ask' the expression that contains them what options are 'in-effect', if they don't have that answer themselves.

That was my initial thought as well. The problem is that not only the containing expression is relevant, but also potentially all proceeding expressions. E.g.

/ (a (?i) b (c)) (?i: d (?-i) e) /x
# b, c and d are case-insensitive, a and e are not

That's why I thought that making "currently active options" a "state" of one of the three layers and setting them for every output object might make sense.

ammar commented 7 years ago

Thanks. I understand the intention now.

I might not have fully understood this distinction, but don't Tokens already have some attributes that cross the boundary from individual definition to execution terms, e.g. set_level?

I don't see set_level? as serving execution per se. I see it serving the parser. I might be splitting hairs here though, especially since execution was never a goal.

Your suggestion makes perfect sense. It supports the main use case for regexp_parser: transformations. Also it doesn't seem like it would cause any harm to add significant complexity.

Do you it will require a minor version bump to 0.5.0? And is there anything I can help with?

jaynetics commented 7 years ago

Ok, you've convinced me to add it to the Parser instead of the Lexer...

Do you think it will require a minor version bump to 0.5.0?

Imho a patch version bump will do, as the old behavior was kind of counter-intuitive and it would be strange for anyone to rely on it.

Also, I did without adding a new token type :options_switch yet. We can make that potentially breaking change and remove some redundancy in v1.0.0.

And is there anything I can help with?

Take a look at the PR :)