Dervall / Piglet

An easier to use parsing and lexing tool that is configurable in fluent code without a pre-build step.
binarysculpting.com
MIT License
91 stars 11 forks source link

Parser problem with a simple Fluent configuration #54

Open stevehobbsdev opened 9 years ago

stevehobbsdev commented 9 years ago

In an effort to learn Piglet (which is excellent, by the way - nice work!), I'm trying to construct a parser based on the Turtle sample. I've come to a point where I want to start declaring and parsing variables, but if I try and work with anything that isn't a quoted string, it fails.

I've included the source that I'm working with below.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Piglet.Parser;

namespace Piglet2
{
    class Car
    {
        public string Name { get; set; }

        public void Move(double distance)
        {
            Console.WriteLine("Moving {0}", distance);
        }

        public void Rotate(double angle)
        {
            Console.WriteLine("Rotating {0}", angle);
        }

        public void EngineOff()
        {
            Console.WriteLine("Turning engine off");
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var variables = new Dictionary<string, object>();

            var input = @"
                car NewCar
                var somevar = ""My car""
                var x = 10
                rotate 20
                move 50.2
                rotate 20.7
                engineoff";

            var car = new Car();

            var config = ParserFactory.Fluent();
            var program = config.Rule();
            var nameDecl = config.Rule();
            var statements = config.Rule();
            var statement = config.Rule();
            var variableDecl = config.Rule();

            var value = config.Rule();

            var symbol = config.Expression();
            symbol.ThatMatches(@"([a-zA-Z]+)").AndReturns(f => f);

            program.IsMadeUp.By(nameDecl).Followed.By(statements);

            statements.IsMadeUp.ByListOf(statement);

            statement.IsMadeUp
                .By(variableDecl)
                .Or.By("engineoff").WhenFound(f =>
                {
                    car.EngineOff();
                    return null;
                })
                .Or.By("move").Followed.By<double>().As("Distance").WhenFound(f =>
                {
                    car.Move(f.Distance);
                    return null;
                })
                .Or.By("rotate").Followed.By<double>().As("Distance").WhenFound(f =>
                {
                    car.Rotate(f.Distance);
                    return null;
                });

            variableDecl.IsMadeUp.By("var")
                .Followed.By(symbol).As("Name")
                .Followed.By("=")
                .Followed.By(value).As("Value")
                .WhenFound(f =>
                {
                    variables[f.Name] = f.Value;
                    return null;
                });

            nameDecl.IsMadeUp.By(@"car")
                .Followed.By(symbol).As("ProgramName")
                .WhenFound(f => car.Name = f.ProgramName);

            value.IsMadeUp
                .By(config.QuotedString)
                .Or.By<int>();

            var parser = config.CreateParser();

            try
            {
                parser.Parse(input);
            }
            catch (ParseException ex)
            {
                throw;
            }

            Console.WriteLine("Parsed program {0}", car.Name);
            Console.WriteLine("Variables found:");

            foreach (var item in variables)
            {
                Console.WriteLine("{0} = {1}", item.Key, item.Value);
            }

            Console.ReadKey();
        }
    }
}

As you can see, I've specified that a 'value' is made up by a quoted string or an integer. The crux of the issue when parsing this program, is that if I include the line "var x = 10" in my input, the parser fails with this exception:

Illegal token \d+(\.\d+)?. Expected "(\\.|[^"])*",\d+

If I take it out, the input gets parsed successfully. I've tried this with booleans (both using .By() and also by specifying a custom boolean expression) and that fails too.

Thanks for taking a look - just wondering if I'm missing something obvious!

Dervall commented 9 years ago

Ok, this has to do with the way that the lexer handles your mix of doubles and ints.

Basically, the lexer will greedily match whatever is thrown to it. When you're specifying both integers and doubles in your configuration by using By and By you're saying that the lexer should match the patterns \d+ for integer and \d+(.\d+)? for doubles. Notice that the string "10" matches BOTH patterns.

The order matters when you declare this as well, because you're using By before you're using the By you're saying that doubles are more important than ints, and that you should prefer to match doubles instead. The lexer matches double for the literal 10 and finds no rule for that in your variable declaration.

Moving the declaration up will cause problems because your commands to move the car will then be matched as ints instead, and that fails since theres no rule matching ints to those.

The solution is to define a further rule for a numeric value, that is matched by both ints and doubles.

        var numericValue = config.Rule();
        numericValue.IsMadeUp.By<int>().As("IntValue").WhenFound(f => f.IntValue)
            .Or.By<double>().As("DoubleValue").WhenFound(f => f.DoubleValue);

        var value = config.Rule();

        value.IsMadeUp
            .By(config.QuotedString)
            .Or.By(numericValue);

        var symbol = config.Expression();
        symbol.ThatMatches(@"([a-zA-Z]+)").AndReturns(f => f);

        program.IsMadeUp.By(nameDecl).Followed.By(statements);

        statements.IsMadeUp.ByListOf(statement);

        statement.IsMadeUp
            .By(variableDecl)
            .Or.By("engineoff").WhenFound(f =>
            {
                car.EngineOff();
                return null;
            })
            .Or.By("move").Followed.By(numericValue).As("Distance").WhenFound(f =>
            {
                car.Move(f.Distance);
                return null;
            })
            .Or.By("rotate").Followed.By(numericValue).As("Distance").WhenFound(f =>
            {
                car.Rotate(f.Distance);
                return null;
            });

This accomplishes two things, it declares the correct order that matching ints are more important than doubles, and it makes the functions accept both ints and doubles, for your functions it will convert the ints back to doubles and call them appropriately.

This is somewhat unintuitive behaviour and I understand your confusion on the matter. A rule such as this would be needed by your program anyhow to avoid the issue with the literal 10 matching both patterns but that the order is important when using By<> is perhaps not exactly crystal clear.

stevehobbsdev commented 9 years ago

That's fantastic! Thanks for clearing that up. I've even put my booleans back in and they work too :-) the ordering makes sense, I think I was missing it because I expected the rules about double and integer to only apply within the rule they were being used. I didn't realise that the rule for statement was iterfering with the rules for what a variable declaration is.

I had one other question that I can't seem to figure out. I've now modified my code and split it so that it says "A statement is either a variable declaration or a command, followed by a new line". In the code in my original question, what was a statement is now a command.

Basically I want to make new lines as statement terminators, but I get:

Illegal token rotate. Expected

program.IsMadeUp.By(nameDecl).Followed.By(statements);

statements.IsMadeUp.ByListOf(statement);

statement.IsMadeUp
    .By(variableDecl).Followed.By(Environment.NewLine)
    .Or.By(command).Followed.By(Environment.NewLine);

command.IsMadeUp
    .By("engineoff").WhenFound(f =>
    {
        car.EngineOff();
        return null;
    })
    .Or.By("move").Followed.By(numericValue).As("Distance").WhenFound(f =>
    {
        car.Move(f.Distance);
        return null;
    })
    .Or.By("rotate").Followed.By(numericValue).As("Distance").WhenFound(f =>
    {
        car.Rotate(f.Distance);
        return null;
    });

With this input:

var input = @"
    car NewCar
    var x = 10.4
    rotate 20
    move 50.2
    rotate 20.7
    engineoff";

Or am I heading the wrong way with this?

Edit: to give you some context, one of the end goals is I'm trying to write a Yaml parser (so far removed from the above example, I know!) and trying to boil it down to the basic elements. Yaml says that an element can be:

name: Steve
job: programming
country: "UK"

i.e. a mix of strings with or without quotes. It's a whitespace-sensitive language and the line breaks are significant. I'm not sure how to express that with Piglet. The Json example is really what I need (as Yaml is closely related) but the line breaks are proving to be an issue.