EddyRivasLab / easel

Sequence analysis library used by Eddy/Rivas lab code
Other
46 stars 26 forks source link

Use intrinsic for `esl_sse_select_ps` if compiling for SSE4 #59

Closed althonos closed 3 years ago

althonos commented 3 years ago

Hi!

esl_sse_select_ps has an equivalent intrinsic available in SSE4.1, with double the throughput (2/3 cycles for the SSE4 version with 1 instruction, 3 cycles for the SSE version using 3 instructions), so it can be used when Easel is compiled with SSE4 support. I added a unit test to make sure both version work the same.