edbennett / performant-numpy

A lesson on ways to write Numpy code that is performant
Other
1 stars 2 forks source link

Mention BLAS as reason for numba lagging behind? #12

Open chillenzer opened 1 year ago

chillenzer commented 1 year ago

At the end of Generalised ufuncs, there is this short comment mentioning that numpy is better at matrix multiplication than a naive for loop. We could add another sentence shortly mentioning that the naive matrix multiplication is very cache-inefficient and (very roughly) how BLAS gets around that. (And that numpy uses BLAS, of course.)