Closed akielaries closed 9 months ago
Since many processors support many instruction sets like MMX, SSE, SSE2, SSE3, AVX, AVX2... determine how to use the highest order (those seem to be the fastest due to increased register widths)
include individual headers of ISAs instead of immintrin as a whole?
benchmark vs OpenBLAS. openGPMP out performs on Skylake but not Xeon except gpmp fortran routines... dig into this
Look into support for SSE and AVX intrinsics for supporting x86 platforms.