datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.05k stars 98 forks source link

TokenizerBuilder and Mapping CLR Types #153

Closed martinkoslof closed 1 year ago

martinkoslof commented 1 year ago

Hello,

I've been taking a crash course in Superpower the last few days and I have a working prototype. However, there is one thing I'm missing and I'd like some advise on how to proceed.

At the moment I am using the default TokenizerBuilder, I really don't want to write a custom tokenizer because at the moment, 90% of everything I need is working. Here is current code for that:

    var tokenizer = new TokenizerBuilder<QueryToken>()
                .Ignore(Span.WhiteSpace)
                .Match(Character.EqualTo('('), QueryToken.LParen)
                .Match(Character.EqualTo(')'), QueryToken.RParen)
                .Match(Span.EqualTo("gt"), QueryToken.GreaterThan, requireDelimiters: true)
                .Match(Span.EqualTo("ge"), QueryToken.GreaterThanEquals, requireDelimiters: true)
                .Match(Span.EqualTo("eq"), QueryToken.Equals, requireDelimiters: true)
                .Match(Span.EqualTo("neq"), QueryToken.NotEquals, requireDelimiters: true)
                .Match(Span.EqualTo("lt"), QueryToken.LessThan, requireDelimiters: true)
                .Match(Span.EqualTo("le"), QueryToken.LessThanEquals, requireDelimiters: true)
                .Match(Span.EqualTo("startswith"), QueryToken.StartsWith)
                .Match(Span.EqualTo("endswith"), QueryToken.EndsWith)
                .Match(Span.EqualTo("contains"), QueryToken.Contains)
                .Match(Span.EqualTo("true"), QueryToken.Bool)
                .Match(Span.EqualTo("false"),QueryToken.Bool)
                .Match(Numerics.DecimalDouble, QueryToken.Number)
                .Match(Numerics.IntegerInt32, QueryToken.Number)
                .Match(Numerics.IntegerInt64, QueryToken.Number)
                .Match(QueryDateTimeParser.DateTime, QueryToken.DateTime)
                .Match(Span.EqualTo("and"), QueryToken.And, requireDelimiters: true)
                .Match(Span.EqualTo("or"), QueryToken.Or, requireDelimiters: true)
                .Match(Character.Letter.IgnoreThen(Character.LetterOrDigit.AtLeastOnce()), QueryToken.Field, requireDelimiters: true)
                .Match(String, QueryToken.Text)
                .Match(Character.EqualTo(','), QueryToken.Comma)
                .Build();

            return tokenizer.Tokenize(filter);

Notice the Matches I have for "Numerics", "String" and QueryDateTimeParser? I borrowed the DateTime parse from here, verbatim: https://github.com/datalust/superpower/blob/dev/sample/DateTimeTextParser/DateTimeTextParser.cs. Unfortunately it doesn't appear to be working for me. I am getting parsing errors when trying to provide a date time. Is this the proper way to integrate this into an existing tokenizer? I realize there is a public static DateTime Parse(string input) method in that parser, but I want it to be indirectly invoked via the template.

The other issue I have is, and maybe I'm missing a better way to do this, but I have several different CLR types I need to support -> int, decimal double, float, guid, datetime and enum for example. It appears for numeric types, there is a generic Numerics class. However, how do I actually map these to the proper CLR property type when building the expression? For example, this is what I have now (On the Parser side of things):

        private static readonly TokenListParser<QueryToken, Expression> Constant =
           Token.EqualTo(QueryToken.Number).Apply(Numerics.IntegerInt32).Select(n => (Expression)Expression.Constant(n))
            .Or(Token.EqualTo(QueryToken.Text).Apply(String).Select(t => (Expression)Expression.Constant(t)))
            .Or(Token.EqualTo(QueryToken.Bool).Select(t => (Expression)Expression.Constant(bool.Parse(t.ToStringValue()))))
            .Or(Token.EqualTo(QueryToken.DateTime).Apply(QueryDateTimeParser.DateTimeOnly).Select(t => (Expression)Expression.Constant(t)));

As noted above, I can't map any tokens to the DateTime enum yet, but even if that works, how do I explicitly set the Expressions for my double, float, guid and other types? In the past I would have a parsed "object" value and do something like this manually, in a helper method that has been provided the property type of the field:

 switch (propertyType)
            {
                case Type _ when propertyType.IsEnum:
                    Enum.TryParse(propertyType, propertyValue, out var enumResult);
                    return (enumResult != null, enumResult!);
                case Type _ when propertyType == typeof(int) || propertyType == typeof(int?):
                    var validInt = int.TryParse(propertyValue, out var asInt);
                    return (validInt, asInt);
                case Type _ when propertyType == typeof(long) || propertyType == typeof(long?):
                    var validLong = long.TryParse(propertyValue, out var asLong);
                    return (validLong, asLong);
                case Type _ when propertyType == typeof(double) || propertyType == typeof(double?):
                    var validDouble = long.TryParse(propertyValue, out var asDouble);
                    return (validDouble, asDouble);
                case Type _ when propertyType == typeof(float) || propertyType == typeof(float?):
                    var validFloat = float.TryParse(propertyValue, out var asFloat);
                    return (validFloat, asFloat);
                case Type _ when propertyType == typeof(decimal) || propertyType == typeof(decimal?):
                    var validDecimal = decimal.TryParse(propertyValue, out var asDecimal);
                    return (validDecimal, asDecimal);
                case Type _ when propertyType == typeof(Guid) || propertyType == typeof(Guid?):
                    var validGuid = Guid.TryParse(propertyValue, out var asGuid);
                    return (validGuid, asGuid);
                case Type _ when propertyType == typeof(DateTime) || propertyType == typeof(DateTime?):
                    var validDateTime = DateTime.TryParse(propertyValue, out var asDateTime);
                    return (validDateTime, asDateTime);
                case Type _ when propertyType == typeof(bool) || propertyType == typeof(bool?):
                    var validBool = bool.TryParse(propertyValue, out var asBool);
                    return (validBool, asBool);
                default:
                    return (true, propertyValue); //string is fine as is
            }

Then I would build my expression constant off this casted value. I'm not sure how to use my tokenizer and parsers to accomplish similar. I hope this question makes sense.

nblumhardt commented 1 year ago

Hi Martin! Thanks for your message. Unfortunately I'm low on time to dig in right now - I do know a few people chime in on parsers, superpower, and some other Stack Overflow tags, though, so may be quicker to get some help if you post your question there, too? Cheers!

martinkoslof commented 1 year ago

Hi Nick,

I understand completely. If you have any ideas or suggestions without having to over extend yourself, let me know. I skimmed over StackOverflow and didn't really see anything related.

I can use a Regex Match in the TokenBuilder to set explicit Tokens For DateTime and Guid. I have booleans handled and strings are self explanatory. I'm not sure how to handle the numeric types. For example, say I have property called SecurityLevel. This is defined as a double. Within a string I need to parse, I receive: "SecurityLevel eq 10". I believe any out of the box parse here would assume it's an int32 and then when building the expression for SecurityLevel, an error would occur because the code is attempting to assign and Int32 Expression to the double property. I'm not sure if there is an obvious solution I'm just not aware about.