At the end of Generalised ufuncs, there is this short comment mentioning that numpy is better at matrix multiplication than a naive for loop. We could add another sentence shortly mentioning that the naive matrix multiplication is very cache-inefficient and (very roughly) how BLAS gets around that. (And that numpy uses BLAS, of course.)
At the end of Generalised ufuncs, there is this short comment mentioning that
numpy
is better at matrix multiplication than a naivefor
loop. We could add another sentence shortly mentioning that the naive matrix multiplication is very cache-inefficient and (very roughly) how BLAS gets around that. (And that numpy uses BLAS, of course.)