Open PaulusParssinen opened 4 months ago
Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in area-owners.md if you want to be subscribed.
Correct area probably CodeGen-coreclr 😆
For constant UTF-8 strings, is u8
string literal an option for you? If not, what's preventing it?
u8
is always the better choice if available.
The ReadUTF8
expansion seems to be able to handle any object handle that is IsKnownImmutable
e.g. frozen objects such as static readonly string
fields. Also in my original plan of simplifying UTF-8 string builder with small custom interpolated string handler it is a problem because the string literals aren't passed as utf-8 RVA spans but as normal strings (we don't have https://github.com/dotnet/csharplang/issues/7072 yet).
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
I think it should be fairly trivial to intrinsify GetByteCount
indeed
I have been recently staring at a lot of UTF-8 <-> UTF-16 manipulation code (in Garnet and ILCompiler name mangling 😆) and thought that if
UTF8EncodingSealed.ReadUTF8
is getting VN expansion when JIT can see the input is CNS, maybe it could be done forEncoding.UTF8.GetByteCount
too, which is most commonly used to calculate the buffer for the following(Try)GetBytes
call (or if the input fits to the existing buffer).We of course have
GetMaxByteCount
for the fast calculation of upper-bound, but doing thestr.Length * 3
makes the buffer lengths go over any stackalloc thresholds very quickly.This is very much inspired after and in theme with https://github.com/dotnet/runtime/pull/85328 (and then painfully hitting https://github.com/dotnet/runtime/issues/93501 😢 )