Dervall / Piglet

An easier to use parsing and lexing tool that is configurable in fluent code without a pre-build step.
binarysculpting.com
MIT License
91 stars 11 forks source link

[DO NOT MERGE YET] Added generic parser extensions #58

Open Unknown6656 opened 4 years ago

Unknown6656 commented 4 years ago

Disclaimer: This is a pull request which is not yet ready for merging due to the following reasons:

I will notify you (@Dervall) when this branch is ready for merging (this could take a few months ... depending on how much time I have).


I changed a couple of things on my fork during the past 2..3 years:


EDIT: ToDo-List of all covered and open points:

Unknown6656 commented 4 years ago

.... And by the way: Thank you SOOOOOO much for creating this wonderful project! I have used it already in half a dozen projects (all of them compilers or interpreters). This library is wonderful and IMHO way easier to use than lexer/parser generators.

Unknown6656 commented 4 years ago

Note to myself: I should move away from T4 Templates, as they are not supported on non-Windows OS

EDIT: Done.

Dervall commented 4 years ago

Thank you for your contributions! I haven't been exactly active in developing this library, and I'm super happy that you like and and that you are using it.

Could you write a little text about what sort of changes you are proposing? I'm not exactly sure what a generic extension is to be honest :)

Also, do you remember what sort of bugs you found and fixed?

I've taken a quick look now, and will look through it in more detail when you feel that you are ready with your changed. Thanks again!

Unknown6656 commented 4 years ago

@Dervall I have to admit, that I do not quite remember the bugs of the 2017/2018-commits (though I mainly recall NullReferenceExceptions and improvements for grammar debugging), however I can give you a small example of my generic extension (I should find a fancier description for that feature):


Imagine having the following grammar:

rectangle := "(" point "," size ")"                                 // (1)
           | "(" number "," number "," number "," number ")"        // (2)

point := "(" number "," number ")"                                  // (3)

size := "(" number "," number ")"                                   // (4)

number := [ "+" | "-" ] \d+

You could of course use object as a type to store all the data inside the different symbols (terminals and non-terminals) ..... however, it would be wiser to use a type-safe syntax, such as generics. Therefore, one does create a parser constructor by inheriting from the abstract class Piglet.Parser.Configuration.Generic.ParserConstructor<T>:

public class RectangleParserConstructor
    : ParserConstructor<Rectangle> // one must inherit 'ParserConstructor<T>'
{
    // implement the abstract method 'void Construct(T)'.
    protected override void Construct(NonTerminalWrapper<Rectangle> start_symbol)
    {
        // this is my naming convention for this example:
        //  t_xxx := terminal symbol
        // nt_xxx := non-terminal symbol

        // create all the terminal and non-terminal symbols:
        NonTerminalWrapper<Point> nt_point = CreateNonTerminal<Point>();
        NonTerminalWrapper<Size> nt_size = CreateNonTerminal<Size>();
        TerminalWrapper<string> t_comma = CreateTerminal(@",");
        TerminalWrapper<string> t_open_parenthesis = CreateTerminal(@"\(");
        TerminalWrapper<string> t_close_parenthesis = CreateTerminal(@"\)");
        TerminalWrapper<int> t_number = CreateTerminal<int>(@"[+\-]?\d+", int.Parse);

        // rule (1)
        start_symbol.AddProduction(t_open_parenthesis, nt_point, t_comma, nt_size, t_close_parenthesis)
                    .SetReduceFunction((_, point, _, size, _) => new Rectangle(point, size));

        // rule (2)
        start_symbol.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_comma, t_number, t_comma, t_number, t_close_parenthesis)
                    .SetReduceFunction((_, x, _, y, _, width, _, height, _) => new Rectangle(x, y, width, height));

        // rule (3)
        nt_point.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_close_parenthesis)
                .SetReduceFunction((_, x, _, y, _) => new Point(x, y));

        // rule (4)
        nt_size.AddProduction(t_open_parenthesis, t_number, t_comma, t_number, t_close_parenthesis)
                .SetReduceFunction((_, width, _, height, _) => new Size(width, height));

        // I could change operator precedence, associativity, etc. here
        // I could also configure the parser to be case insensitive
    }
}

To use this parser, one uses the following few lines:

static void Main()
{
    var constructor = new RectangleParserConstructor();
    var parser = constructor.CreateParser();

    ParserResult<Rectangle> result = parser.Parse("((-10, 20), (100, 300))");

    Console.WriteLine(result.ParsedValue); // the parsed result (this has the type 'Rectangle'!!)
    Console.WriteLine(result.LexedTokens); // a list of lexed tokens
}


Couple of points worth mentioning:

[The rectangle-above is rather boring and not very creative, but you definitely get the idea of generic parsers.]