WebAssembly / relaxed-simd

Relax the strict determinism requirements of SIMD operations.
Other
38 stars 6 forks source link

Rounding semantics for `i32x4.relaxed_dot_i8x16_i7x16_add_s` #129

Closed alexcrichton closed 1 year ago

alexcrichton commented 1 year ago

I've stumbled upon an issue where the Overview.md isn't clear enough to me to figure out what to do and the included spec tests in this repository disagree with the implementation of this instruction in both v8 and SpiderMonkey. This instruction in these engines is implemented, on x86_64, with:

The spec tests currently have a test that looks like:

;; signed * unsigned   : -128 *  129 * 4 = -66,048 (+ 1)
;; signed * signed     : -128 * -127 * 4 =  65,024 (+ 1)
;; unsigned * unsigned :  128 *  129 * 2 =  66,048 (+ 1)
(assert_return (invoke "i32x4.relaxed_dot_i8x16_i7x16_add_s"
                       (v128.const i8x16 -128 -128 -128 -128 0 0 0 0 0 0 0 0 0 0 0 0)
                       (v128.const i8x16 -127 -127 -127 -127 0 0 0 0 0 0 0 0 0 0 0 0)
                       (v128.const i32x4 1 2 3 4))
               (either
                 (v128.const i32x4 -66047 2 3 4)
                 (v128.const i32x4  65025 2 3 4)
                 (v128.const i32x4  66049 2 3 4)))

but the implementation on x64 matches none of these answers. The first row is intended to be selected on x64 but the pmaddubsw does a saturating multiply-and-add so the intermediate result of the pmaddubsw instruction is -32768, not the intended -33024. This means that the final result of this instruction sequence is -65535, not the expected -66047 according to the spec test.

I'm not sure what the intention here and what's right and what's wrong. The Overview.md doesn't mention the saturating semantics so this lowering for x86_64 doesn't match the overview or the in-progress spec PR I think. On the other hand I suspect the intention of this instruction was to use pmaddubsw on x86_64 so it may be a case where the wording for this instruction needs to be updated?

While I'm here on this topic as well, some other side notes I would have are:

ngzhian commented 1 year ago

https://github.com/WebAssembly/relaxed-simd/issues/52 should be clearer on what the intention is.

The first row, -66047 is intended to be VPDPBUSD (AVX2-VNNI or AVX512-VNNI):

Multiply groups of 4 adjacent pairs of unsigned 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in src, and store the packed 32-bit results in dst.

I think the spec test (and overview) is missing the case that you mentioned using pmaddubsw + pmaddwd + paddd.

which should be as you said:

I'll add the missing test and update the overview.

Re: i16x8.relaxed_dot_i8x16_i7x16_s, the saturation should be allowed, #52 mentions it, it is missing from overview and likely spec text as well.

ngzhian commented 1 year ago

re: engine and spec tests, i appreciate all your reports on the bugs in the spec tests and will fix all of them. The spec test should at least cover all the edge cases, if I'm missing any, lmk.

I expect that engines will have their own tests, and then additionally run spec tests. It would be nice if spec tests is a superset of all engine tests, but it will require contribution effort.

I think engines don't run these tests yet, but as this proposal has been making swift progress, I think they are all starting to import and run the tests (and hence discover issues.)

alexcrichton commented 1 year ago

This was addressed in #130 so closing, thanks!

Also FWIW I think the testing situation isn't something that's specific to this proposal itself, it's sort of more broad for all proposals. I have found it nontrivial to contribute tests in the past which deters me from proposing more tests, but nevertheless this isn't the place to discuss the nuances of test contributions I mostly wanted to ask and see if there was a shared body of tests elsewhere but sounds like there isn't.

ngzhian commented 1 year ago

This was addressed in #130 so closing, thanks!

Also FWIW I think the testing situation isn't something that's specific to this proposal itself, it's sort of more broad for all proposals. I have found it nontrivial to contribute tests in the past which deters me from proposing more tests, but nevertheless this isn't the place to discuss the nuances of test contributions I mostly wanted to ask and see if there was a shared body of tests elsewhere but sounds like there isn't.

Agree that there is friction to contribute tests to the spec. You have been really helpful with your bug reports. If you find something lacking in the tests, please continue filing bugs, and I can turn it into spec tests and get your reviews on those. Thanks!

ngzhian commented 1 year ago

Updated spec text in https://github.com/WebAssembly/relaxed-simd/pull/115

I think the imul usage is correct, the imul is used in a helper operation that performs the (lane-wise) multiply (which can be signed signed or unsigned unsigned (depending on the host). The overflow happens when adding the intermediate 16-bit, which I was using a iadd, which is wrong.

I fixed that by making the addition a saturating addition.

In the signed signed case, the addition should not overflow `-128 -128 * 2 = 32768`, the saturation does nothing.

In the signed unsigned case `-128 129 * 2 = −33024 < -32768` so it saturates it.