Open TysonAndre opened 2 years ago
-march=native may help more than doing this by hand in some cases - what works well on some architectures may be worse on others (e.g. skylake)
-march=native
https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html performance seems to vary, e.g. _mm_loadu_si128 -based approach does worse on skylake than gcc's output for the original implementation of teds_intvector_is_sorted_int16_t
https://en.wikipedia.org/wiki/Broadwell_(microarchitecture)
-march=native
may help more than doing this by hand in some cases - what works well on some architectures may be worse on others (e.g. skylake)https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html performance seems to vary, e.g. _mm_loadu_si128 -based approach does worse on skylake than gcc's output for the original implementation of teds_intvector_is_sorted_int16_t
https://en.wikipedia.org/wiki/Broadwell_(microarchitecture)