Update double/float <=> int SIMD operations to use new WASM SIMD operations

VirtualTim commented 2 years ago

For example _mm_cvtss_si32 (intel doco link) is implemented as:

static __inline__ int __attribute__((__always_inline__, __nodebug__, DIAGNOSE_SLOW)) _mm_cvtss_si32(__m128 __a)
{
  int x = lrint(((__f32x4)__a)[0]);
  if (x != 0 || fabsf(((__f32x4)__a)[0]) < 2.f)
    return x;
  else
    return (int)0x80000000;
}

This is obviously going to be slow. Likely slower than not using SIMD. The WASM SIMS spec was updated at some point (link) with double/float <=> int conversions. The above code could probably be implemented like:

static __inline__ int __attribute__((__always_inline__, __nodebug__)) _mm_cvtss_si32(__m128 __a)
{
  return wasm_i32x4_trunc_sat_f32x4((v128_t)__a);
}

Browser support is tracked here (sadly without version numbers), and these conversion operations appear to all be in the latest wasm_simd128.h from llvm.

@tlively you seem like the expert on this stuff.

tlively commented 2 years ago

@VirtualTim would you like to make a PR for this?

VirtualTim commented 2 years ago

I thought about it, but I wasn't completely confident that I was matching native/WASM SIMD operations correctly. Are there's tests for this? I could take a crack.

Do you agree that _mm_cvtss_si32 should directly map to wasm_i32x4_trunc_sat_f32x4?

tlively commented 2 years ago

Yes, there are fairly thorough tests for these intrinsics. And looking more closely, no, this Wasm instruction does not perfectly match this intrinsic. The intrinsic only converts and return the first element of the vector, but the Wasm instruction produces a whole new converted vector. It's possible that return wasm_i32x4_trunc_sat_f32x4((v128_t)__a)[0]; would work, though.

VirtualTim commented 2 years ago

Looking into this a bit more, I think these are not 100% compatable. The SSE/SSE2 ctv operations round using either the current rounding mode or truncation. The WASM SIMD spec says rounding is towards zero. So I think we can replace _mm_cvttss, but not _mm_cvtss.

I think _mm_cvttps_epi32 could also be replaced with just a return wasm_i32x4_trunc_sat_f32x4(__a);. There's probably a few others in emmintrin.h that directly map to WASM SIMD. Also a few of those should be annotated with DIAGNOSE_SLOW, as those are going to be really slow.

I'm also struggling to see how SSE deals with saturation. I assume _mm_cvttps_epi32 saturates, but I can't see any documentation on that

I'm feeling a little out of my depth here.

ngzhian commented 2 years ago

Looking into this a bit more, I think these are not 100% compatable. The SSE/SSE2 ctv operations round using either the current rounding mode or truncation. The WASM SIMD spec says rounding is towards zero. So I think we can replace _mm_cvttss, but not _mm_cvtss.

I think _mm_cvttps_epi32 could also be replaced with just a return wasm_i32x4_trunc_sat_f32x4(__a);. There's probably a few others in emmintrin.h that directly map to WASM SIMD. Also a few of those should be annotated with DIAGNOSE_SLOW, as those are going to be really slow.

I'm also struggling to see how SSE deals with saturation. I assume _mm_cvttps_epi32 saturates, but I can't see any documentation on that

I'm feeling a little out of my depth here.

A good way to figure out what the intrinsics do is to:

look it up on Intel Intrinsics Guide (https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_cvttps_epi32&ig_expand=2490)
find the actual instruction name, in this case cvttps2dq
Look up instruction in Intel's ISA manual, or at this excellent website https://www.felixcloutier.com/x86/cvttps2dq

If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.

It always returns INT32_MIN for out of bounds values.

VirtualTim commented 2 years ago

Thanks @ngzhian, I was already aware of the first link, but not of the second.

How does Emscripten deal with exception masking? I assume that we don't support SIMD operations throwing exceptions?

ngzhian commented 2 years ago

Yup, no exceptions, assumed that every time the instructions say "if exception is masked" it is masked.

emscripten-core / emscripten

Update double/float <=> int SIMD operations to use new WASM SIMD operations #16544