dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

Intrinsify Encoding.UTF8.GetByteCount for constant UTF-16 input #102246

Open PaulusParssinen opened 4 months ago

PaulusParssinen commented 4 months ago

I have been recently staring at a lot of UTF-8 <-> UTF-16 manipulation code (in Garnet and ILCompiler name mangling 😆) and thought that if UTF8EncodingSealed.ReadUTF8 is getting VN expansion when JIT can see the input is CNS, maybe it could be done for Encoding.UTF8.GetByteCount too, which is most commonly used to calculate the buffer for the following (Try)GetBytes call (or if the input fits to the existing buffer).

We of course have GetMaxByteCount for the fast calculation of upper-bound, but doing the str.Length * 3 makes the buffer lengths go over any stackalloc thresholds very quickly.

This is very much inspired after and in theme with https://github.com/dotnet/runtime/pull/85328 (and then painfully hitting https://github.com/dotnet/runtime/issues/93501 😢 )

dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @dotnet/area-system-text-encoding See info in area-owners.md if you want to be subscribed.

PaulusParssinen commented 4 months ago

Correct area probably CodeGen-coreclr 😆

huoyaoyuan commented 4 months ago

For constant UTF-8 strings, is u8 string literal an option for you? If not, what's preventing it?

PaulusParssinen commented 4 months ago

u8 is always the better choice if available.

The ReadUTF8 expansion seems to be able to handle any object handle that is IsKnownImmutable e.g. frozen objects such as static readonly string fields. Also in my original plan of simplifying UTF-8 string builder with small custom interpolated string handler it is a problem because the string literals aren't passed as utf-8 RVA spans but as normal strings (we don't have https://github.com/dotnet/csharplang/issues/7072 yet).

https://github.com/dotnet/runtime/blob/426edd093a8d83a45119716c0ae75592b5dc4812/src/coreclr/jit/helperexpansion.cpp#L1616-L1617

dotnet-policy-service[bot] commented 4 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

EgorBo commented 4 months ago

I think it should be fairly trivial to intrinsify GetByteCount indeed