esl_sse_select_ps has an equivalent intrinsic available in SSE4.1, with double the throughput (2/3 cycles for the SSE4 version with 1 instruction, 3 cycles for the SSE version using 3 instructions), so it can be used when Easel is compiled with SSE4 support. I added a unit test to make sure both version work the same.
Hi!
esl_sse_select_ps
has an equivalent intrinsic available in SSE4.1, with double the throughput (2/3 cycles for the SSE4 version with 1 instruction, 3 cycles for the SSE version using 3 instructions), so it can be used when Easel is compiled with SSE4 support. I added a unit test to make sure both version work the same.