golems / motion-grammar-kit

Formal Language Tools for Robots
BSD 2-Clause "Simplified" License
13 stars 5 forks source link

Documentation/regex #18

Open vityok opened 9 years ago

vityok commented 9 years ago

Hi,

I am collecting information for my pet project and would like to know more about the regex implementation in this library.

Could you please provide some background/explanations if this regex engine is intended to be used to process strings/sequences/vectors, etc. how is it used (API) what are its features?

Thanks, Victor

Tarrasch commented 9 years ago

Where's the regex engine used in this project? (I have kind of forgotten, been literally years since I poked around here)

vityok commented 9 years ago

Arrash,

I thought that regular expressions are used in this project because of the regex.lisp file in the sources. A brief look at its contents shows that it deals with converting regular expressions (in some form) into DFA and NFA using McNaughton-Yamada-Thompson algorithm.

However, documentation is really rigid and in order to better understand what is going on, one must look a bit deeper.

If you still can recall, could you please provide some more information what is this code about?

Thanks, Victor

Tarrasch commented 9 years ago

We do regex stuff in the parser generator implementations. Doing sorts of "Dragoon bookie" stuff. To generate a LL(1) or LL(*) parser, it's useful to go back and forth between finite [non]deterministic state machines and the equivalent regex representation.

Does this give you a slightly better picture? I have no idea what @ndantam does nowadays though.

ndantam commented 9 years ago

The regex and FA implementations here are not really aimed at text processing or pattern matching, but at general symbolic analysis -- i.e., we do not assume that the terminal symbol alphabet consists of characters, and we implement various set operations on the regular languages.

You could build a text pattern matcher on top of this work using the resulting minimized DFA. Compared to CL-PPCRE -- which I believe is not doing DFA minimization -- this may produce a more compact and efficient matcher. However, some features of typical regex/pattern engines such as capture groups may be more difficult to implement.

If you are interested in improving the state of Common Lisp text pattern matching, I personally would like to see a pattern matcher that operates on ropes (https://en.wikipedia.org/wiki/Rope_(data_structure)) instead of just array-based strings.

vityok commented 9 years ago

Hi Neil, @Tarrasch,

thanks for your explanation. My goal is not so bold as to improve the state of regular expressions engines in Common Lisp, just to learn a bit of different algorithms and programming techniques.

Though I do have a very practical interest in Boyer-Moore-Horspool and Commentz-Walter implementations.

BTW, I've created a ticket for creating a BMH matcher for ropes and it looks like it will be not that difficult (for the ropes implementation by @Ramarren).

Thanks, Victor