WebAssembly / flexible-vectors

Vector operations for WebAssembly
https://webassembly.github.io/flexible-vectors/
Other
48 stars 6 forks source link

Immediates vs regular values for lane indices #46

Closed penzn closed 2 years ago

penzn commented 2 years ago

This has been brought up by @rossberg at a meeting last year.

Currently spec takes immediates for lane indices, like machine SIMD instructions do, but this is a challenge semantically - either validation of those immediates should be deferred, or module would be invalid if underlying vector size is too small for the lane indices in it.

Changing lane indices to be regular i32 values would require some optimizations to produce simple machine instructions when index is constant (and in many contexts those are constants). Another approach, maybe as a supplement, would be to define operations to access lanes within 128 bits with immediate indices as those are guaranteed to be valid. We need to weigh pros and cons here, though from the spec point of view using variable indices for general lane access is cleaner.

akirilov-arm commented 2 years ago

I am not sure how switching to i32 parameters as in #47 solves the underlying issue - it is still possible to pass an index that is larger than the current vector size. IMHO we should start with defining the out-of-bounds behaviour formally, as I have suggested in #35.

penzn commented 2 years ago

47 now describes out of bounds behavior, PTAL

There is a special case that addresses a limited number of lanes in low part of the vector using immediates or as separate ops. For scalar fallback this would be just lane 0, but for us it should be lanes within reach of existing SIMD standard.

For vectors longer than 128 bits another concern is that extracting/replacing lanes in the upper portion of the vector is not as cheap as lower 128 bits, which can make these operations somewhat expensive (in rather non-uniform way as well).

The main options for variable indices:

Both have pros and cons. Checks are obviously expensive, but they can be elided in some cases, which would require some amount of dataflow analysis. Adding a modulo operation is one more instruction to execute in possibly long lowering.

For code that doesn't rely on wrap around semantics checks would help catch unintended behavior, and code that relies on it would not be a good fit for checking semantics, but this depends on what code we can find to port to this proposal. For example, it seems that @jan-wassenberg's highway project uses checked semantics, and modulo semantics won't apply to it.