dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.36k stars 4.75k forks source link

ARM64 SVE: Duplicating a scalar to a vector is non optimal #108321

Open a74nh opened 1 month ago

a74nh commented 1 month ago

SVE provides LD1RW to load a single 32bit value from memory and broadcast to all lanes in a vector.

I'm not quite sure why we didn't add this to the SVE API.

However, this can be done via:

Vector<uint> vec = Sve.DuplicateSelectedScalarToVector(Sve.LoadVector(Sve.CreateTrueMaskUInt32(), input), 0);

Which produces:

            ptrue   p0.s
            ld1w    { z17.s }, p0/z, [x7]
            mov     z17.s, s17

This could be optimised to:

            ptrue   p0.s
            ld1rw    { z17.s }, p0/z, [x7]

Regardless of whether an API method is added, the optimisation should be done.

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.