benjamin-hodgson / Pidgin

A lightweight and fast parsing library for C#.
https://www.benjamin.pizza/Pidgin/
MIT License
914 stars 70 forks source link

Need help with `Try` or `Int(10)` #153

Open snailcatcher opened 8 months ago

snailcatcher commented 8 months ago

Hi there,

I don't understand either how Int(10) or how Try(...) is working. In the below example I want to parse either a integer number or a string containing only numbers or letters. So I try to read Int(10) until it fails and want to go back and read it as string of LetterOrDigit. I don't get what I'm doing wrong here.

using Pidgin;
using Xunit;
using Xunit.Abstractions;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;

namespace ExpressionParserPOC;

public class MiniRepro
{
   private readonly ITestOutputHelper _testOutputHelper;

   public MiniRepro(ITestOutputHelper testOutputHelper)
   {
      _testOutputHelper = testOutputHelper;
   }

   private static readonly Parser<char, string> PParser =
      OneOf(
            Try(Int(10).Select(x => x.ToString())),
            Try(LetterOrDigit.AtLeastOnceString())
         )
         .Before(End);

   [Fact]
   public void Test_11()
   {
      var result = PParser.ParseOrThrow("11");
      _testOutputHelper.WriteLine(result); // 11
   }

   [Fact]
   public void Test_a1()
   {
      var result = PParser.ParseOrThrow("a1");
      _testOutputHelper.WriteLine(result); // a1
   }

   [Fact]
   public void Test_1a()
   {
      var result = PParser.ParseOrThrow("1a");
      _testOutputHelper.WriteLine(result); // Parse error. unexpected a expected end of input at line 1, col 2
   }

   [Fact]
   public void Test_aa()
   {
      var result = PParser.ParseOrThrow("aa");
      _testOutputHelper.WriteLine(result); // aa
   }
}
benjamin-hodgson commented 8 months ago

Sounds like you're asking about the 1a example? Int(10) consumes the 1 and succeeds, so there's no need for Try to backtrack and the second arm of the OneOf never gets attempted.

You could try something like

Try(Int(10).Select(ToString).Before(End))
    .Or(LetterOrDigit.AtLeastOnceString())

But I'm a little unclear on what you're trying to achieve here, why not just use LetterOrDigit.AtLeastOnceString() since that'll accept a numeric string just as easily?

snailcatcher commented 8 months ago

The real use case I struggle with is that I want to differentiate between a Int32 (with or without leading +/- sign) and everything else. In my case I want to prevent numbers with a decimal point from being read as numbers. And later I want to eventually read Int32, Real or everything else.


using Pidgin;
using Xunit;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;

namespace ExpressionParserPOC;

public interface IParam { }

public class IntParam(int value) : IParam
{
   public int Value { get; } = value;
}

public class InvalidParam(string value) : IParam
{
   public string Value { get; } = value;
}

public class MiniRepro
{
   private static readonly Parser<char, IParam> PParser =
      OneOf(
            Try(Int(10).Select(x => new IntParam(x)).Cast<IParam>()),
            Try(Any.AtLeastOnceString().Select(x => new InvalidParam(x)).Cast<IParam>())
         )
         .Before(End);

   public static IEnumerable<object[]> SampleData()
   {
      yield return ["1", new IntParam(1)]; // success
      yield return ["2", new IntParam(2)]; // success
      yield return ["1000", new IntParam(1000)]; // success
      yield return ["1.0", new InvalidParam("1.0")]; // Parse error. unexpected . expected end of input at line 1, col 2
      yield return ["1.000", new InvalidParam("1.000")]; // Parse error. unexpected . expected end of input at line 1, col 2
      yield return ["1.000.000", new InvalidParam("1.000.000")]; // Parse error. unexpected . expected end of input at line 1, col 2
   }

   [Theory]
   [MemberData(nameof(SampleData))]
   public void Test(string input, IParam expected)
   {
      var result = PParser.ParseOrThrow(input);
      Assert.Equivalent(expected, result);
   }
}
snailcatcher commented 8 months ago

I kinda solved my problem partly. But I am not very pleased with the solution I found. It seems very complex, it is hacky, it does not support e^x for doubles and i can not get my InvalidParam working for all the stuff the parser could not handle. Actually it will be amazing if I can simply use Int(10) and Real for this job, but I can't get my head around a working solution.

It would be amazing if someone can show me how to get around this.

using System.Globalization;
using System.Text.Json;
using ExpressionParserPOC.PidginParser;
using Pidgin;
using Xunit;
using static Pidgin.Parser;
using static Pidgin.Parser<char>;

namespace ExpressionParserPOC;

public interface IParam { }

public class IntParam(int value) : IParam
{
   public int Value { get; } = value;
}

public class DoubleParam(double value) : IParam
{
   public double Value { get; } = value;
}

public class InvalidParam(string value) : IParam
{
   public string Value { get; } = value;
}

public class MiniRepro
{
   private static readonly Parser<char, IParam> PNumberParameter =
      Map((sign, units, dot, nanos) =>
            {
               var leading = sign.HasValue ? sign.Value.ToString() : string.Empty;
               if (dot.HasValue && nanos.HasValue)
               {
                  var success = double.TryParse($"{leading}{units}.{nanos.Value}", NumberStyles.AllowLeadingSign | NumberStyles.AllowDecimalPoint, CultureInfo.InvariantCulture, out var result);
                  if (success)
                     return new DoubleParam(result);
               }
               else if (!dot.HasValue && !nanos.HasValue)
               {
                  var success = int.TryParse($"{leading}{units}", NumberStyles.AllowLeadingSign, CultureInfo.InvariantCulture, out var result);
                  if (success)
                     return new IntParam(result);;
               }

               return (IParam?) null;
            },
            OneOf(Char('-'), Char('+')).Optional(),
            Digit.AtLeastOnceString(),
            Char('.').Optional(),
            Digit.AtLeastOnceString().Optional())
         .Assert(x => x != null, "Could not parse a number")
         .Select(x => x!);

   public static IEnumerable<object[]> SampleData4()
   {
      yield return ["1", new IntParam(1)]; // success
      yield return ["2", new IntParam(2)]; // success
      yield return ["1000", new IntParam(1000)]; // success
      yield return ["1.0", new DoubleParam(1.0)]; // success
      yield return ["1.000", new DoubleParam(1.000)]; // success
      yield return ["1000.000", new DoubleParam(1000.000)]; // success

      yield return ["1.000.000", new InvalidParam("1.000.000")]; // Assert.Equivalent() Failure: Mismatched value on member 'Value'
                                                                 // Expected: "1.000.000"
                                                                 // Actual:   1
   }

   [Theory]
   [MemberData(nameof(SampleData4))]
   public void Test4(string input, IParam expected)
   {
      var result = PNumberParameter.ParseOrThrow(input);
      Assert.Equivalent(expected, result);
   }
}