JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
230 stars 18 forks source link

Bad performance for large matrices when work is unbalanced #104

Open chriselrod opened 3 years ago

chriselrod commented 3 years ago

Currently, I suspect syncmul! is the problem.

using Octavian
M = K = N = 10_000;
A = rand(M,K); B = rand(K,N); C = Array{T}(M, N);
@time matmul!(C, A, B);

On a computer with 18 threads, using 18 threads with Julia + running OBSStudio in the background resulted in a roughly 2x performance degredation vs using Julia with 16 threads.