Current SIMD Instructions ignore 0xfd (SIMD Prefix) and another SIMD prefix unspecified

rrwinterton commented 4 years ago

The current SIMD byte code generation doesn't use the SIMD prefix. What is potentially something that could be done now to help future compatibility is to have the engine ignore the prefix for SIMD in current versions. I don't know if this is a big change the the engine? If not when the tools are ready to generate extended SIMD instructions the 0xfd extension could be used for the new set of SIMD. (AVX2/AVXVL, ARM-SVE). These would not have to be variable length per ARM spec but 0xFD could be implemented as a AVXVL instruction for both the 128 and 256 instruction lengths or ARM SVE 256 instructions. That way a single bytecode could be generated that will work now and in the future when the tools are supporting the 256 length and more hardware is out in the work to support it and there is a demand for it. The new code generated with the prefix would run on the older engines and older hardware because it uses the opcode without the prefix. We wouldn't have to have two binaries in the future.

lars-t-hansen commented 4 years ago

@rrwinterton, can you provide more context here? The SIMD spec clearly requires the 0xFD prefix for all the current SIMD instructions. Is this a proposal or are you reporting an error in a particular implementation?

tlively commented 4 years ago

This is a proposal to specify multiple possible prefixes for the SIMD operations so that in the future one of the prefixes could be reinterpreted to mean a larger vector length should be used. The idea is that if all current SIMD code is written to be generic over the vector length, then current SIMD code could automatically start using the larger vector lengths without changes. Since we can't guarantee that all code is generic over the vector length, though, that would be a breaking change. Additionally, we would have to do major work in the toolchain and on the spec side to allow code to be generic over the vector length at all. I think this idea is interesting, but it falls well outside the accepted goals of this proposal, couldn't achieve backwards compatibility, and would unacceptably delay shipping the current SIMD proposal.

rrwinterton commented 4 years ago

Thomas is right that it is an idea of promoting existing SIMD instructions to a larger vector length. The idea is to use the same opcode for the larger vector lengths without defining a whole new set of instructions. Just add a prefix. Instead of 0xfd 0x00 for a load we would add the new opcode for 256 bit loads say 0xfb 0x00 for the 256 bit load and for now the new prefix would be the same as 0xfd. Also we may need an 0xfa that would act similar to the ARM SVE2 predicate to help with loop control. I am not sure why it would fail the backwards compatibility goal since we aren't changing any executed code in the current design and would treat the 0xfb and 0xfa as a 0xfd. and the predicate could be a nop in the future. This proposal wouldn't require a tools change for release for current generated code. When we move forward we would just add the two prefixes one for larger vector lengths and one for predictor type functionality similar to SVE. As far as the spec goes I was thinking about defining 2 prefixes reserved for SIMD. As far as implementation goes if these two prefixes are seen and the code generation doesn't support the extended prefixes yet (which they don't) it would treat it as a 0xfd. When we decide to look at longer SIMD we can use the two prefixes one as defining the longer instruction and the second as a predictor value instruction. To understand what I mean by predictors it is something that ARM-SVE does now. I can write more in detail and present. Basically I am taking the idea similar to SVE in http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100891_0607_00_en/wap1490203634804.html SVE and modifying it to not be a variable length vector but still allowing for a loop count "predictor" so that it makes it easier for the engine to generate the actual loop count and instructions. I do agree with Thomas that anything we propose here we don't want to have a delay effect on the current proposal. I just don't see the defining of two or three prefixes to be treated as SIMD instructions instead of just one as something that would delay the shipping.

binji commented 4 years ago

As an alternative, we could make the instructions parameterized on the stack type when/if we add v256 or other sizes. For example: i32x4.add currently has the signature [v128 v128] -> [v128]. But we could also allow [v256 v256] -> [v256] for the same opcode. We'd probably want to call it i32x8.add or i32xN.add in the text format, but the binary format could be unchanged.

If we did go this route, we'd only need to add the instructions that don't take a v128 value as parameter, e.g. the *.load* and *.splat instructions.

We haven't done this for other instructions (e.g. i32.add vs. i64.add) so maybe we'd want to avoid it here too. But I thought it was worth mentioning.

penzn commented 4 years ago

A solution to extending SIMD to longer sizes been covered in detail in the discussion on long vectors, see WebAssembly/simd#210, WebAssembly/flexible-vectors#2 - we are looking to present to CG on 04-28 to see if we can turn that into a proposal.

Having 256-bit variety is a bit tricky, as it would not work on existing Arm processors (aside from ones supporting SVE, which is not very common yet).

penzn commented 4 years ago

I think this discussion can become part of the new proposal - we are hoping to get a repo set up if there would be consensus in the CG. There was consensus in Wasm SIMD sync on 04/03 to go to phase 0 discussion for the previously presented solution.

WebAssembly / flexible-vectors

Current SIMD Instructions ignore 0xfd (SIMD Prefix) and another SIMD prefix unspecified #3