clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

improve big sgemm column NN perf. improve small sgemm NN perf. #87

Closed TimmyLiu closed 9 years ago

TimmyLiu commented 9 years ago

improve big sgemm column NN perf: replace barrier() with mem_fence() in the inner loop. improve small sgemm NN perf: for small sgemm (M_N < 900_900) and M or N is not multiples of 32, use kernel with micro tile size 2 by 2 instead of micro tile size 6 by 6. Note: kernel with other micro tile sizes might have better performance than these 2 cases. Finer tuned heuristic of switching from kernel to kernel is also a good to have.