ammar / regexp_parser

A regular expression parser library for Ruby
MIT License
143 stars 22 forks source link

Parser is too greedy with alternation #2

Closed calfeld-zz closed 11 years ago

calfeld-zz commented 11 years ago

Example code:

#!/usr/bin/env ruby

require 'rubygems'
require 'regexp_parser'
require 'pp'

s = "prefi(?:x)a|b|c"

re = Regexp.new(s)
p "prefixa" =~ re
p "prefixb" =~ re
p "prefixc" =~ re

pp Regexp::Scanner.scan(s)
pp Regexp::Parser.parse(s)

Note that the regexp does match "prefixa", "prefixb", and "prefixc". However, the parse constructs an alternation node containing "prefix(?x)", "b", and "c" as the three alternatives.

The code in question is at parser.rb:94

when :alternation
  unless @node.token == :alternation
    alt = Alternation.new(token)
    seq = Sequence.new
    while @node.expressions.last
      seq.insert @node.expressions.pop
    end
    alt.alternative(seq)

    @node << alt
    @node = alt
    @node.alternative
  else
    @node.alternative
  end

The code

while @node.expression.last

is too greedy. Would it be correct to execute only absorb the last expression rather than all previous expressions?

ammar commented 11 years ago

Thanks for the report, but I'm not sure that I understand the problem.

The regexp is not matching "prefixa", "prefixb", and "prefixc". It is actually matching "prefixa", or "b", or "c". Try printing the match data ($~) to see what is happening, like this:

p "prefixa" =~ re, $~
p "prefixb" =~ re, $~
p "prefixc" =~ re, $~

That seems to match what the parser is doing. Or am I missing something?

Cheers!

calfeld-zz commented 11 years ago

Thank you for the prompt reply. You are quite right. I need to go fix some of our regexps.