blynn / nex

Lexer for Go
http://cs.stanford.edu/~blynn/nex/
GNU General Public License v3.0
418 stars 47 forks source link

shortest possible match | non greedy match #12

Closed sprungknoedl closed 11 years ago

sprungknoedl commented 11 years ago

There is currently no possibility I could find to get the shortest possible match (non-greedy behaviour).

There should be a possibility to split the following snippet:

<?php
  b
?>
text
<?php
  a
?>

to these 2 matches:

<?php
  b
?>

and

<?php
  a
?>

Currently the regex /<\?php.*\?>/ matches the whole text.

Or did I simply miss something? Thanks

blynn commented 11 years ago

This behaviour has been inherited from Flex: http://flex.sourceforge.net/manual/Why-doesn_0027t-flex-have-non_002dgreedy-operators-like-perl-does_003f.html. In short, they recommend moving the logic from the lexer to the parser, or using start conditions (which nex does not support yet).

However, I'm willing to add non-greedy operators to nex if I can find the time. Stay tuned!

sprungknoedl commented 11 years ago

Thanks for your help :+1: . I moved said logic to goyacc, but the resulting code is not the prettiest. For my use case, a lot of code duplication arose because I'm only interested in the tokens inside PHP code.

I look forward to your changes :)

purpleidea commented 7 years ago

For anyone searching for the above broken link, here is a mirror:

http://www.cas.mcmaster.ca/~kahl/SE3E03/2006/flex/flex_82.html

(Since nex seems sort of unmaintained, I've resorted to looking at closed issues for hints! Let's all drop any learnings that we get to, somewhere like here!)