dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.44k stars 4.76k forks source link

API Proposal: Support the use-cases of mask-extraction on Vector<T> #30569

Open IJzerbaard opened 5 years ago

IJzerbaard commented 5 years ago

Mask-extraction, such as MOVMSKPS/D and PMOVMSKB are commonly used to

All such uses could be supported by adding a low-level API such as:

public static ulong ExtractMask(Vector<T> x);

But this has some issues,

Instead I propose supporting some narrower specific use cases, for example:

struct Vector<T> {
    public int CompressedCopyTo(Vector<T> mask, T[] destination, int startIndex);
}

CompressedCopyTo stores the elements selected by the mask and returns the number of elements written. Admittedly this has a problem: this cannot be used safely near the end of the destination array even if the number of elements selected by the mask plus the startIndex would be less than the length of the destination, because an entire vector is always stored, it just has the selected elements packed at the start. Using the masked store instruction is not a solution because it has a non-temporal hint, which makes it unsuitable for general use. AVX512 compress-store (eg VCOMPRESSPS with a memory destination) does not have this problem, but is not widely supported.

There are some other issues,

This is a tough nut to crack but compression of filtered results is broadly applicable and currently impossible with the System.Numerics.Vector API.

To support getting the index of the first or last match, I propose:

public static int FirstIndexOf<T>(Vector<T> vector, T value);
public static int LastIndexOf<T>(Vector<T> vector, T value);
public static int FirstIndexOfNonZero<T>(Vector<T> vector); (optional)
public static int LastIndexOfNonZero<T>(Vector<T> vector); (optional)

With the usual semantic of returning the first or last index of a match if there is one, and -1 otherwise. Possible applications include:

Issue: should floating point vectors be supported? They could be, but they raise questions about the precise semantics, eg does NaN equal itself for the purpose of finding its index, do 0.0 and -0.0 equal each other, etc.

tannergooding commented 5 years ago

Thanks for logging this API proposal @IJzerbaard.

I'll give this a more thorough look over when I get into the office tomorrow and will likely leave it marked as api-suggestion for 1-2 weeks so that the community can provide any applicable feedback.

scalablecory commented 5 years ago

This or some variant would be nice.

tannergooding commented 5 years ago

CC. @CarolEidt

Do you have an opinion on how best we could support a scenario like this? Namely, there are some APIs (like mask extracting) which are good cross-platform candidates and which are generally more-useful, but which just returning an int or long may not work since the size of Vector<T> is not strictly defined.

Forcing users to always go through a Span<T> seems undesirable, but we may also need to eventually support things like Vector2048 (ARM SVE extensions support this, for example).

Potentially, something like this could be supported via more explicit APIs that operate directly on Vector128<T> and Vector256<T> in a cross-platform manner...

CarolEidt commented 5 years ago

This is definitely the sort of thing that requires some careful API design (and deeper thinking). This PR provides a good start in that it outlines a few of the key use cases. Further, I think we can now start thinking about Vector<T> as a higher-level abstraction for which we need not constrain the APIs to map to a single machine instruction, especially as we reduce the friction of utilizing the HW intrinsics to implement operations on Vector<T>.

I think we should consider adding cross-platform APIs that operate on fixed-size vectors (Vector128<T>, Vector256<T> and perhaps even Vector64<T>), but I think we should also continue to look at expanding the Vector<T> APIs.

GSPP commented 5 years ago

It would be good to write realistic prototype code to design an API like this. It can be a bit hard to foresee what API shape is what's needed in practical code.