corvus-dotnet / Corvus.Globbing

A zero allocation globbing library
Apache License 2.0
17 stars 1 forks source link

Show heap-allocated example for `GlobTokenizer.Tokenize`? #3

Open idg10 opened 2 years ago

idg10 commented 2 years ago

Although the README.md shows how to use stackalloc to store the result of tokenizing the glob expression, and how to fall back to ArrayPool to handle cases where there will be too many tokens to reasonably use the stack, the output of the parse is stored in a ReadOnlySpan<GlobToken> meaning that this technique can't hang onto the tokenized version beyond the duration of a single method call.

Given that one of our driving scenarios is to strip links from documents returned by a service, we're going to want to use the same glob patterns time and time again, across multiple requests over the lifetime of a service. So we're still going to end up reparsing the same patterns again and again—the techniques shown in the README.md won't hold onto the tokenized rep beyond a single method invocation (although they can use that rep any number of times within that invocation).

For these scenarios, it makes sense to parse once and then hang onto the result for longer, which would mean putting it on the heap.

This is easy enough to do: you can just use a GlobToken[] field with normal array allocation syntax instead of Span<GlobToken> with stackalloc. But it would be good for the docs to show this.

We could even consider adding a helper. You could write a struct that encapsulates this, something like:

public readonly struct GlobParseReuser
{
    private readonly string pattern;
    private readonly GlobToken[] tokenizedGlob;

    public GlobParseReuser(string pattern)
    {
        this.pattern = pattern;

        // There can't be more tokens than there are characters in the glob pattern,
        // so we allocate an array at least that long.
        var tokenizedGlob = new GlobToken[pattern.Length];
        int tokenCount = GlobTokenizer.Tokenize(pattern, tokenizedGlob);
        // And then slice off the number of tokens we actually used.
        // Note: this ends up doing a second allocation. For the intended use case,
        // that's fine because the full-length one allocated initially will get
        // collected, and then this one with exactly the correct length is the
        // one that will hang around (most likely surviving to gen2, given the
        // intended usage).
        this.tokenizedGlob = tokenizedGlob[..tokenCount];
    }

    public bool Match(string value, StringComparison comparisonType = StringComparison.Ordinal) => Glob.Match(this.pattern, this.tokenizedGlob, value, comparisonType);
}