256-bit AVX intrinsics support

Compile existing x86 SSE/AVX SIMD code into WASM SIMD is very attractive, developer can reuse existing library without rewrite it. However currently only 128-bit subset of the AVX intrinsics are supported, many existing code cannot meet this restriction. Adding 256-bit AVX intrinsics support will expand the applicable scenarios and may also increase performance. Does emscripten have a plan for this?

Currently Google Highway supports WASM_EMU256 (a 2x unrolled version of wasm128) target, A re-vectorize optimization phase is being developed in Google V8 JS engine, which can pack two SIMD128 nodes into one SIMD256 node.

Sample code for AVX intrinsics support:

typedef struct Vec256 {
  __m128 v0;
  __m128 v1;
}__m256;

static __inline__ __m256 __attribute__((__always_inline__, __nodebug__))
_mm256_add_ps(__m256 __a, __m256 __b) {
    __m256 c;
    c.v0 = (__m128)wasm_f32x4_add((v128_t)__a.v0, (v128_t)__b.v0);
    c.v1 = (__m128)wasm_f32x4_add((v128_t)__a.v1, (v128_t)__b.v1);
    return c;
}

emscripten-core / emscripten

256-bit AVX intrinsics support #21684