datalust / superpower

A C# parser construction toolkit with high-quality error reporting
Apache License 2.0
1.05k stars 98 forks source link

Added Span.Until to Superpower.Parsers #129

Closed Ellested closed 3 years ago

Ellested commented 3 years ago

I rely on the compiler to inline functions as necessary, so I have not done any optimizations.

nblumhardt commented 3 years ago

Thanks! We'll take a look as soon as we have some time free to think it through 👍

nblumhardt commented 3 years ago

Sorry about the slow turnaround on this. I have a few thoughts but haven't been through in enough detail to be able to offer a useful review just yet - I'll add some comments anyway in case they're useful to move things forwards.

Ellested commented 3 years ago

Is it that bad :-)

The only speed improvement I currently see is to avoid the remainder pattern. I'm actually a little puzzled on why you can only consume one char on the TextSpan. Why not just advance the position and keep consuming? Of course the struct is allocated on the stack, but there's still some copying that needs to through, and the only difference is the position. Well, I guess that's just my limited knowledge of not working with the details.

nblumhardt commented 3 years ago

Hehe 😄 no it's not bad at all! Just my week that got out of control 😅

I think this is good, I just need to give it another look over and hit the button, but getting back into the right head-space takes a bit of time. Sorry it's a a slow process right now!

I haven't had a chance to check out what you mean regarding the character-by-character consumption, but I suspect it's because of the line number tracking Superpower does. There are definitely still opportunities to improve all over the place, though.

Ellested commented 3 years ago

I'm glad that you didn't press that button. There was a major issue with the calculation of the found position, as it didn't include the input offset. I found out, when I used the new version today in my other project :-)

Regarding my question consider the ConsumeChar:

public Result<char> ConsumeChar()
{
    EnsureHasValue();
    if (IsAtEnd)
    {
        return Result.Empty<char>(this);
    }
    char ch = Source[Position.Absolute];
    return Result.Value(ch, this, new TextSpan(Source, Position.Advance(ch), Length - 1));
}

My question is, why it's returning a new TextSpan everytime we consume a char. Why not just keep the current TextSpan, and advance the position until something interesting happens? I'm only interested in the character, but I get a complete result struct where only the position and length is changed. So even though it's a struct, it is still memory initialization/copying of bytes, that weighs in tight loops.

I'm only asking because I'm guessing and haven't spend any time on stuff like this before. But I'm super happy about starting to get deeper into it, as it opens up a wealth of new opportunities.

PS. How about a PeekChar method :-)

nblumhardt commented 3 years ago

Glad you were able to spot the issue!

TextSpan is immutable so that parsers can treat them like bookmarks to try different options while moving backwards and forwards within the string. HTH!

SuperJMN commented 3 years ago

When is this going to be merged? 🤔

Ellested commented 3 years ago

I'm not sure whether it's me or Nicolas we are waiting for :-)

I would like to know if I should go for indexOf or keep the consume/remainder pattern before continuing. Maybe it could also be determined later. Might need a benchmark test to see which is better, as I think it will vary with the size of the scan.

Well, I will update according to the last comments, and then we can decide on the indexOf after that :-)

SuperJMN commented 3 years ago

@nblumhardt this looks OK to me!

nblumhardt commented 3 years ago

https://www.nuget.org/packages/Superpower/3.0.0-dev-00201 is out :-)

Thanks for sticking with this, @Ellested!

I made some tweaks on the way through - current version is at https://github.com/datalust/superpower/blob/dev/src/Superpower/Parsers/Span.cs#L275

Ellested commented 3 years ago

Looks great. I didn't realize the source string was available all this time, so I thought you meant I should use intput.ToStringValue() and then IndexOf() on the result. Now I get it :-)

nblumhardt commented 3 years ago

Ah - great! 👍 Thanks again!