f16x8.demote_f64x2_zero

Do we need direct conversion from F64 lanes to F16? Recently, we discovered there is pure support on x64 for direct f64 -> f16 conversion. F64 to F16 conversion with roundToEven semantic couldn't be efficiently implemented on x64 without AVX512-FP16(so, only Sapphire Rapids Xeon support it today). We can introduce f16x8.demote_f64x2_zero with roundToZero rounding, so it will be composable with f64 -> f32 -> f16 conversions, but today roundToEven semantic is a default on the Web and such instruction could be easily emulated with f32x4.demote_f64x2_zero + f16x8.demote_f32x4_zero On the other side Arm64 supports direct F64 -> F16 conversion with roundToEven rounding mode, but without f16x8.demote_f64x2_zero it won't be possible to use it.

WebAssembly / half-precision

f16x8.demote_f64x2_zero #2