Closed bluss closed 7 years ago
In sgemm (f32) use 8x8 kernel if AVX is enabled. (This is open to more platform specific tuning).
Improvement from sgemm 4x8 to 8x8 with avx:
name old-f32 ns/iter new-f32 ns/iter diff ns/iter diff % mat_mul_f32::m004 110 103 -7 -6.36% mat_mul_f32::m005 162 117 -45 -27.78% mat_mul_f32::m006 170 136 -34 -20.00% mat_mul_f32::m007 191 148 -43 -22.51% mat_mul_f32::m008 211 163 -48 -22.75% mat_mul_f32::m009 371 346 -25 -6.74% mat_mul_f32::m012 484 461 -23 -4.75% mat_mul_f32::m016 702 605 -97 -13.82% mat_mul_f32::m032 3,513 3,013 -500 -14.23% mat_mul_f32::m064 20,804 18,757 -2,047 -9.84% mat_mul_f32::m127 143,522 124,790 -18,732 -13.05% mat_mul_f32::m256 1,029,880 904,001 -125,879 -12.22%
In sgemm (f32) use 8x8 kernel if AVX is enabled. (This is open to more platform specific tuning).
Improvement from sgemm 4x8 to 8x8 with avx: