AdamNiederer / faster

SIMD for humans
Mozilla Public License 2.0
1.56k stars 51 forks source link

faster floor/ceil/round in pre SSE 4.1 cases #44

Open jackmott opened 6 years ago

jackmott commented 6 years ago

If I am reading the code correctly, it looks like in the case of SSE2 Faster currently falls back to calling round()/floor() etc on each individual lane via the fallback macro.

You may be able to use these methods instead: http://dss.stephanierct.com/DevBlog/?p=8

Or Agner Fog has a different method in his vector library: http://www.agner.org/optimize/vectorclass.zip

edit: Agner's functions are slower but can handle floating point values that don't fit in an i32, the first functions only handle values that do fit in an i32.