ed-cooper / Ebnf.Compiler

Compiles EBNF statements to C# DLLs
MIT License
2 stars 0 forks source link

Whitespace inside literals is silently removed #2

Closed furesoft closed 5 years ago

furesoft commented 5 years ago

hi, i have this ebnf: `Keyword = "struct" | "enum"; BrOp = "{"; BrCl = "}"; sc = " " | "\t"; s = sc, {sc};

Statement = Keyword, s, BrOp, s, BrCl;`

but when I Use Validation.IsStatement("enum { }", ...); it blocks my thread and not finish parsing

ed-cooper commented 5 years ago

Hi, thanks for flagging up the issue.

This is a copy of the code produced when compiling the above instructions:

/*
 * IMPORTANT:
 * This code has been automatically generated by a tool.
 * Any modifications to this code will not be kept.
 * 
 * This file was last generated at 16/11/2018 19:24:44
 */

using System.Collections.Generic;

[assembly: System.CLSCompliant(true)]

namespace ebnf
{
    /// <summary>
    /// Provides methods for validating a string to the ebnf specification.
    /// </summary>
    public static class Validation
    {
        /// <summary>
        /// Keyword="struct"|"enum"
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        public static bool IsKeyword(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.Keyword);
            Node newNode;
            remainder = input;
            // "struct"
            if (remainder.StartsWith("struct"))
            {
                remainder = remainder.Substring(6);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            // "enum"
            if (remainder.StartsWith("enum"))
            {
                remainder = remainder.Substring(4);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

        /// <summary>
        /// BrOp="{"
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        public static bool IsBrOp(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.BrOp);
            Node newNode;
            remainder = input;
            // "{"
            if (remainder.StartsWith("{"))
            {
                remainder = remainder.Substring(1);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

        /// <summary>
        /// BrCl="}"
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        public static bool IsBrCl(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.BrCl);
            Node newNode;
            remainder = input;
            // "}"
            if (remainder.StartsWith("}"))
            {
                remainder = remainder.Substring(1);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

        /// <summary>
        /// sc=""|"\t"
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        public static bool Issc(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.sc);
            Node newNode;
            remainder = input;
            // ""
            if (remainder.StartsWith(""))
            {
                remainder = remainder.Substring(0);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            // "\t"
            if (remainder.StartsWith("\t"))
            {
                remainder = remainder.Substring(2);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

        /// <summary>
        /// s=sc,{sc}
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        public static bool Iss(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.s);
            Node newNode;
            remainder = input;
            // sc
            if (Issc(remainder, out remainder, out newNode, level + 1))
            {
                parseTree.Children.Add(newNode);
                // {sc}
                while (IssSubDef1(remainder, out remainder, out newNode, level))
                    parseTree.Children.AddRange(newNode.Children);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

        /// <summary>
        /// sSubDef1=sc
        /// </summary>
        /// <param name="input">The input string to validate.</param>
        /// <param name="remainder">The part of the string that did not match the given rule set; empty if full match.</param>
        /// <param name="parseTree">The parse tree produced from the input.</param>
        /// <param name="level">The internal level of recursion; starts at 0.</param>
        private static bool IssSubDef1(string input, out string remainder, out Node parseTree, int level = 0)
        {
            parseTree = new Node(NodeType.s);
            Node newNode;
            remainder = input;
            // sc
            if (Issc(remainder, out remainder, out newNode, level + 1))
            {
                parseTree.Children.Add(newNode);
                parseTree.Value = input.Substring(0, input.Length - remainder.Length);
                return true;
            }
            parseTree.Children.Clear();
            remainder = input;
            return false;
        }

    }

    /// <summary>
    /// Represents a node in a parse tree.
    /// </summary>
    public class Node
    {
        #region Properties

        /// <summary>
        /// Gets or sets the string text that this node represents.
        /// </summary>
        public string Value { get; set; }

        /// <summary>
        /// Gets or sets the type of this node.
        /// </summary>
        public NodeType TypeName { get; set; }

        /// <summary>
        /// Gets or sets the collection of child nodes to this node.
        /// </summary>
        public List<Node> Children { get; set; }

        #endregion

        #region Constructor

        /// <summary>
        /// Creates a new instance of the <see cref="Node"/> class.
        /// </summary>
        public Node()
        {
            Children = new List<Node>();
        }

        /// <summary>
        /// Creates a new instance of the <see cref="Node"/> class.
        /// </summary>
        /// <param name="typeName">The type of the node.</param>
        public Node(NodeType typeName)
        {
            Value = "";
            Children = new List<Node>();
            TypeName = typeName;
        }

        #endregion

        #region Methods

        /// <summary>
        /// Returns a string that represents the current object.
        /// </summary>
        public override string ToString()
        {
            return TypeName.ToString() + ": Value";
        }

        #endregion
    }
    /// <summary>
    /// Represents all the possible types of node in the parse tree.
    /// </summary>
    public enum NodeType
    {
        /// <summary>
        /// Keyword="struct"|"enum"
        /// </summary>
        Keyword,
        /// <summary>
        /// BrOp="{"
        /// </summary>
        BrOp,
        /// <summary>
        /// BrCl="}"
        /// </summary>
        BrCl,
        /// <summary>
        /// sc=""|"\t"
        /// </summary>
        sc,
        /// <summary>
        /// s=sc,{sc}
        /// </summary>
        s,
    }
}

If you could produce a minimal working example to demonstrate the problem that would be ideal, in order to help identify which code is not being generated correctly.

ed-cooper commented 5 years ago

Upon further examination of the produced code I have identified the following issue:

Before parsing the sc = " " | "\t"; instruction, all white space is removed. This causes is to be parsed as sc=""|"\t";. The effect of this can be seen in the Validation.issc method, where it tests whether the input starts with an empty string. The output is therefore always true, meaning the token can be matched infinite times without any progression further into the input string, thus causing the thread to be blocked.

Therefore, as a temporary alternative, I suggest trying the following alternative statement and if you could feedback to me whether it works it would be ideal:

sc = "\u0020" | "\t";
ed-cooper commented 5 years ago

The specific line with the issue is here.

This RegEx expression needs to be updated to ignore whitespace inside quotation marks.