datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.05k stars 98 forks source link

How to handle consecutive delimiters #158

Closed woha closed 5 months ago

woha commented 5 months ago

I'm trying to parse an input where the elements are separated by semicolon, but I'm unable to get it working for consecutive delimiters.

Example: var input = "TEST123;TEST456;;HALLO";

And I'm getting Syntax error (line 1, column 17): unexpected;, expected TgwComCon value.

(Using .ManyDelimitedBy(Token.EqualTo(TgwComConToken.Semicolon)).AtEnd();).

Is there a way to return null or string.Empty for "empty" elements?

marklauter commented 5 months ago

I'm not 100% sure, but I think it's up to the "this parser" to be able to parse an empty or zero result. parser.ManyDelim.. ^ parser must be able to succeed with something like "" or "\0" input. Based on the error, I'd guess this is the TgwComCon parser.

This is based on my read of the ManyDelim... method

        public static TextParser<T[]> ManyDelimitedBy<T, U>(this TextParser<T> parser, TextParser<U> delimiter)
        {
            if (parser == null) throw new ArgumentNullException(nameof(parser));
            if (delimiter == null) throw new ArgumentNullException(nameof(delimiter));

            return parser.Then(first => delimiter.IgnoreThen(parser).Many().Select(rest => ArrayEnumerable.Cons(first, rest)))
                .OptionalOrDefault(Array.Empty<T>());
        }
woha commented 5 months ago

Thank you @marklauter. I think you are totally correct. I've tried that, but failed. Could you please give me an example / idea of how to parse such an empty result?

marklauter commented 5 months ago

Full disclosure, I'm new to parser combinators and am still working my way through Graham Hutton's and Erik Meijer's 1996 paper, but I will attempt a sample today after work.

marklauter commented 5 months ago

I wrote a very simple tokenizer with a "null value" parser that recognizes consecutive ';' characters. The semicolons can be separated by whitespace. Readme shows output. I hope it's useful.

https://github.com/marklauter/superpower-delimited-combinator

woha commented 5 months ago

@marklauter This works simply perfect!

Thank you very much for your time and help, highly appreciated.

Looks like I should start reading Graham Hutton's and Erik Meijer's paper as well - I wish you some enlightening moments reading it.

Once again - thank you!

marklauter commented 5 months ago

I'm so glad this idea worked for you. Happy parsing.