corvus-dotnet / Corvus.Globbing

A zero allocation globbing library
Apache License 2.0
18 stars 1 forks source link

Do we need versions of `Glob.Match` that work with UTF-8 strings? #5

Open idg10 opened 2 years ago

idg10 commented 2 years ago

The two overloads of Glob.Match currently take in ReadOnlySpan<char>, meaning that the text must be in UTF-16 format. Do we need to support matching directly against UTF-8?

mwadams commented 2 years ago

I think this would be a good thing to support and we could open an issue to do so.

At the moment, because System.Text.Json doesn't give us a way to get at the UTF8 bytes the first usecase I have in mind couldn't use them, though.

idg10 commented 2 years ago

Ah yes, I keep forgetting that there isn't a straightforward way to get at the UTF-8, although you can get it to write the UTF-8 out into an IBufferWriter<byte>. With buffer pooling that can be alloc-free per-iteration. (Doesn't avoid the copy of course, but I presume that direct access to the underlying buffer is deliberately not allowed because there might not actually be one—maybe the original doc was actually UTF-16 encoded, or perhaps it's split across multiple buffers.)

But I wasn't expecting to implement this immediately anyway—it was more a place for discussion and perhaps an eventual "defer/don't/yes" decision. So I think we're on "defer" right now.

mwadams commented 1 year ago

Now that we have a (fairly) efficient way of getting at the UTF8 text - this issue now becomes "a good idea".