Open arlyon opened 1 year ago
Yeah at the moment it just treats the glob as bytes. I think we could probably do this in a way that isn't too perf intensive. For example treating the glob as bytes to find special characters (e.g. *
, [
, etc.), but then within a character class interpreting as Unicode characters.
Hi! It would be nice if this library specifies how it handles multi-codepoint-characters or graphemes (🎉 ). I was comparing this against the doublestar go library (https://github.com/bmatcuk/doublestar) which seems to handle unicode whereas this evaluates globs at the codepoint level and so certain things don't line up.
Example:
a[^b]c
matchesacc
, but nota🔥c
. Of course emoji is a simple example but there are large volumes of 'regular' unicode such as other-language characters that could end up in paths. I am willing to contribute (and have started) a feature-flag toggle that allows for this, since it will presumably be more performance intensive than simply going char-for-char when looking for grapheme boundaries.I would not expect this to work with ranges (to me should be undefined), though we could have lowu32 <= var <=highu32
Thanks for the lib!
Alex