Open sunfishcode opened 7 years ago
It's a good question. My guess is that with SIMD the performance would get pretty close, but I haven't been able to verify that.
Personally, I think that this is a common enough operation that it is useful to have an instruction rather than require the wasm module to include an optimized version. Perhaps that's not an issue, if everyone uses the same libc -- though it's likely that only the WebAssembly VM will know the optimal instruction sequence for its host architecture.
I guess I'm still a bit unsure of the utility myself. :-)
This comparison, when done, should be across multiple architectures to also see which approach has better performance portability.
With SIMD as an active proposal, what are the expectations for the performance of future SIMD-optimized implementations of the proposed operations?