Closed cocowalla closed 5 years ago
Interesting idea. I'd have to do some research on this. Always open to any PRs that improve perf.
Few things that spring to mind.
DotNet.Glob uses the string length in a few places to shortcut processing. Suppose you ask it to match a glob pattern that has a min length requirement of 5 - e.g *12345*
and you match this glob pattern against a few thousand strings. It won't bother evaluating any of those strings that are less than 5 in length - it can immediately reject them because they fail the min length heuristics. If we add support for span then need to consider if we lose this length informatioin and the ability to do this short circuiting. Could that lead to a perf loss in some circumstances?
Also I was thinking, perhaps a Span
could be used to load each string into the same segment of memory, match against it there, then clear the span, and load in the next string into the same span. The idea is, can this be done, and would this save memory consumption when comparing against a large amount of strings / file names.
Not sure what you mean about losing the length information?
ReadOnlySpan<char> mySpan = "myergen".AsSpan();
var len = mySpan.Length; <--- = 7
Ah right I see. Im missing something though. Could you spell out for me where the perf improvement would come into play - doesnt that still take an allocated string and then access it it in a span ? So in that sense its not avoiding any string allocation. Or am I missing something..
A typical use for globbing is matching paths in a file system, where you'd recurse down through a directory and call Glob.IsMatch
on each path to see if it's a match. Until recently, that meant allocating rather a lot of strings - but with the new Span
-based System.IO.Path
APIs, it's actually possible to do this with very little allocations, which means that callers may have a span, rather than a string. I was thinking of overloading Glob.IsMatch:
public bool IsMatch(string subject)
public bool IsMatch(ReadOnlySpan<char> subject)
There would be quite a bit of work to get that working behind the scenes though.
I'm sure it's obvious, but I should add that this is a 'nice to have', rather than a 'must have' - Span
and friends are in .NET Core 2.1, but won't make it into Standard until .NET Standard 2.1. There are also some serious issues with the new, low allocating file system enumeration that means it's not practical to use it yet (disappointingly, due to known non-production ready code being shipped in .NET Core 2.1).
@cocowalla thanks for the detailed explaination - this makes sense. It sounds like this would be a good future feature to add when things are a bit more stable then. Let me know when you think that point is reached and we can re-look at this.
I added span support and an additional target framework. #62
Once netstandard2.1 ships netcoreapp2.1 can be changed. @dazinator I'm actually using this library with FileSystemEnumerable right now haven't tested the perf different with Span
Thanks @pwiens - have merged the PR, can close this one now ;-)
I am going to do a little bit more work on this.
I have:
ReadOnlySpan<char>
on .NET Core 2.1..NET Standard 2.1
as a compilation target, in addition to .NET Core 2.1
to that as other platforms implement .net standard they will also be able to consume this new API. I'll release version 3.0.0
shortly.
This has been released in version 3.0.0
- thanks for contributing!
Thanks man! First meaningful contribution to open source after 11 years in the industry. :)
With the recent 'spanification' of .NET Core, in particular the addition of the Span-based
FileSystemEnumerable
andSystem.IO.Path
Span-based methods, I'm wondering if it would be possible to add Span-based methods toGlob
? The point would be to avoid allocating strings when performing Glob-based matching.