Open VirtualTim opened 2 years ago
@VirtualTim would you like to make a PR for this?
I thought about it, but I wasn't completely confident that I was matching native/WASM SIMD operations correctly. Are there's tests for this? I could take a crack.
Do you agree that _mm_cvtss_si32
should directly map to wasm_i32x4_trunc_sat_f32x4
?
Yes, there are fairly thorough tests for these intrinsics. And looking more closely, no, this Wasm instruction does not perfectly match this intrinsic. The intrinsic only converts and return the first element of the vector, but the Wasm instruction produces a whole new converted vector. It's possible that return wasm_i32x4_trunc_sat_f32x4((v128_t)__a)[0];
would work, though.
Looking into this a bit more, I think these are not 100% compatable. The SSE/SSE2 ctv operations round using either the current rounding mode or truncation. The WASM SIMD spec says rounding is towards zero. So I think we can replace _mm_cvttss
, but not _mm_cvtss
.
I think _mm_cvttps_epi32
could also be replaced with just a return wasm_i32x4_trunc_sat_f32x4(__a);
. There's probably a few others in emmintrin.h
that directly map to WASM SIMD. Also a few of those should be annotated with DIAGNOSE_SLOW
, as those are going to be really slow.
I'm also struggling to see how SSE deals with saturation. I assume _mm_cvttps_epi32
saturates, but I can't see any documentation on that
I'm feeling a little out of my depth here.
Looking into this a bit more, I think these are not 100% compatable. The SSE/SSE2 ctv operations round using either the current rounding mode or truncation. The WASM SIMD spec says rounding is towards zero. So I think we can replace
_mm_cvttss
, but not_mm_cvtss
.I think
_mm_cvttps_epi32
could also be replaced with just areturn wasm_i32x4_trunc_sat_f32x4(__a);
. There's probably a few others inemmintrin.h
that directly map to WASM SIMD. Also a few of those should be annotated withDIAGNOSE_SLOW
, as those are going to be really slow.I'm also struggling to see how SSE deals with saturation. I assume
_mm_cvttps_epi32
saturates, but I can't see any documentation on thatI'm feeling a little out of my depth here.
A good way to figure out what the intrinsics do is to:
cvttps2dq
If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.
It always returns INT32_MIN for out of bounds values.
Thanks @ngzhian, I was already aware of the first link, but not of the second.
How does Emscripten deal with exception masking? I assume that we don't support SIMD operations throwing exceptions?
Yup, no exceptions, assumed that every time the instructions say "if exception is masked" it is masked.
For example
_mm_cvtss_si32
(intel doco link) is implemented as:This is obviously going to be slow. Likely slower than not using SIMD. The WASM SIMS spec was updated at some point (link) with double/float <=> int conversions. The above code could probably be implemented like:
Browser support is tracked here (sadly without version numbers), and these conversion operations appear to all be in the latest wasm_simd128.h from llvm.
@tlively you seem like the expert on this stuff.