JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
226 stars 18 forks source link

Dual-socket support #151

Open carstenbauer opened 2 years ago

carstenbauer commented 2 years ago

In my recent dgemm comparison benchmarks (on an Zen3 AMD Milan system) I find that Octavian is essentially not scaling at all from single-socket to dual-socket. Below 64 cores corresponds to a full single socket and 128 cores to the full dual-socket system.

BLAS # cores size GFLOPS
Intel MKL v2022.0.0 128 cores 10240 3279
Intel MKL v2022.0.0 64 cores 10240 1684
BLIS 0.9.0 128 cores 10240 3893
BLIS 0.9.0 64 cores 10240 2014
Octavian 0.3.15 128 cores 10240 1843
Octavian 0.3.15 64 cores 10240 1802

Would be great to see Octavian perform better here :)

chriselrod commented 2 years ago

With: https://github.com/JuliaSIMD/CPUSummary.jl/commit/d93cf1c1765c37c9fbe809b68a3e5f10fb6bb458 It should support dual sockets.

However, Octavian also does not support more than 64 threads: https://github.com/JuliaLinearAlgebra/Octavian.jl/blob/ccd903373524827e92ddb4f68967529e4761626b/src/matmul.jl#L378 This will have to be replaced with the more usual PolyesterWeave.request_threads, which returns a tuple. Then it'll have to iterate over these when launching threads.

Apparently I never got around to doing that, and just accessed some internals to only get the first element of the tuple, rather than the entire tuple.

Each element of the tuple corresponds to sets of 64 threads.

Then, of course, #152 is a third issue.