ammar / regexp_parser

A regular expression parser library for Ruby
MIT License
143 stars 22 forks source link

Alternations are sometimes not correctly parsed #6

Closed camertron closed 10 years ago

camertron commented 10 years ago

Consider this regex (for matching postal codes from Ecuador): /[A-Z]\d{4}[A-Z]|(?:[A-Z]{2})?\d{6}/. The alternation in the middle should effectively split the regex down the middle, since the alternation operator should have the lowest precedence of all regex operators. The AST should look like this:

                root
                 |
                alt
               /   \
[A-Z]\d{4}[A-Z]    (?:[A-Z]{2})?\d{6}

However, the AST generated by regexp_parser looks like this:

                    root
                    /  \
                 alt    \d{6}
                /   \
[A-Z]\d{4}[A-Z]      (?:[A-Z]{2})?

I'm not sure how to go about fixing this, any thoughts?

ammar commented 10 years ago

Hi @camertron. Thanks for pointing it out. I will try to look at it this weekend.

I would start by looking at the meta method in the parser (lib/parser.rb). That's were the alternation sequences get "compiled". Perhaps it needs to look back further than the last node when it's a quantifier or a quantified group? Not sure. I would have to dig in to get a better idea.

Cheers!

ammar commented 10 years ago

Insomnia, so...

The problem happens when the last group is closed and the nesting is exited. Adding the following as the last line to the close_group method in the parser fixes the issue, and does not break any tests.

@node = @node.last if @node.last.is_a?(Alternation)

I will check this in tomorrow, after I run a few more tests, with a fresh brain.

camertron commented 10 years ago

Great, thanks @ammar! You're always so fast answering and fixing these issues, thank you :)

ammar commented 10 years ago

My pleasure. Always happy to help. Also happy that some have found applications for this rather esoteric gem. Cheers!