datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.07k stars 99 forks source link

How to parse multiple TextParser from TokenListParser #89

Closed HofmeisterAn closed 5 years ago

HofmeisterAn commented 5 years ago

I'm trying to use Superpower to parse a text file. I think my tokenizer works pretty well. However, I struggle to write a working parser and I'm looking for some help and advices.

My file looks like following example:

#(
#Dictionary 
#('Foo' ' ->' 
#RefFoo) 
#('Bar' ' ->' 
#RefBar)) 

I use this code to generate the token list:

internal enum SmllToken
{
  Reference,
  BlockStatementBegin,
  BlockStatementEnd,
  Identifier,
  String,
}

internal static class SmllTokenizer
{
  private static TextParser<Unit> IdentifierToken { get; } =
    Span.Regex("[A-Za-z]+").Value(Unit.Value);

  private static TextParser<Unit> StringToken { get; } =
    Span.Regex("'[A-Za-z]+'").Value(Unit.Value);

  public static Tokenizer<SmllToken> Instance { get; } =
    new TokenizerBuilder<SmllToken>()
    .Ignore(Span.WhiteSpace)
    .Ignore(Span.EqualTo('#'))
    .Match(Character.EqualTo('('), SmllToken.BlockStatementBegin)
    .Match(Character.EqualTo(')'), SmllToken.BlockStatementEnd)
    .Match(Span.EqualTo("' ->'"), SmllToken.Reference)
    .Match(IdentifierToken, SmllToken.Identifier)
    .Match(StringToken, SmllToken.String)
    .Build();
}

After that I would like to parse the different values from my text file to get something like { Foo: RefFoo }, { Bar: RefBar }. Unfortunately, I'm not able to parse ('Foo' ' ->' #RefFoo) multiple times correct. I always get different syntax errors with invalid identifiers. I tried a couple of different ways to write the parser - none of them worked. Most of the time they looked like this:

private static readonly TextParser<string> Foo =
  from chars in Character.AnyChar.Many()
  select new string(chars);

private static readonly TextParser<string> Bar =
  from chars in Character.AnyChar.Many()
  select new string(chars);

private static readonly TokenListParser<SmllToken, object> None =
  Token.EqualTo(SmllToken.Reference)
  .Or(Token.EqualTo(SmllToken.BlockStatementBegin))
  .Or(Token.EqualTo(SmllToken.BlockStatementEnd))
  .Value((object)Unit.Value);

private static readonly TokenListParser<SmllToken, object> Pair =
  Token.EqualToValue(SmllToken.Identifier, "Dictionary")
  .Then(x => Token.EqualTo(SmllToken.String).Apply(Foo))
  .Then(y => Token.EqualTo(SmllToken.Identifier).Apply(Bar))
  .Select(foo => (object)foo);

private static readonly TokenListParser<SmllToken, object> Values =
  None.Or(Pair);

public static readonly TokenListParser<SmllToken, IEnumerable<object>> Instance =
  Values.Many().AtEnd().Select(value => value.AsEnumerable());

I think I miss the part to parse the surrounding brackets ( and ), as well as ' ->' (twice). It would be great if some could help me to understand how to parse the example text file or string correct and how Superpower works.