JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
230 stars 18 forks source link

Using matmul in code threaded by Polyester sometimes gives wrong result or hangs #105

Closed tlauli closed 3 years ago

tlauli commented 3 years ago

The following code either gives incorrect results.

julia> using Polyester, Octavian

julia> a = [randn(Float32, 128, 128) for _ in 1:4];
julia> b = [randn(Float32, 128, 128) for _ in 1:4];
julia> c = similar(a);
julia> @batch for i in 1:4
               c[i] = matmul(a[i], b[i])
       end

julia> c
4-element Vector{Matrix{Float32}}:
 #undef
 #undef
 #undef
    [-1.0401227 6.1655574 … -5.2372017 -4.046586; -5.9543576 4.8168693 … 9.247597 -32.092808; … ; 7.5443554 3.4970012 … 1.7187954 2.4799664; -1.5603023 -5.0854316 … 3.936642 -8.162479]

When i run the loop for a second time in the same REPL, the code hangs. Then it does not react to some number of interrupts via ^C, and then julia crashes with segmentation fault. I managed to get a stacktrace once, but now I am unable to reproduce it. It lead to this function.

If i try running the code with smaller matrices first and then gradually increase the size to 128*128 (4->32->64->128), everything works as expected, and c contains the results of all matmuls.

Output of versioninfo:

julia> versioninfo()
Julia Version 1.6.2
Commit 1b93d53fc4 (2021-07-14 15:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
chriselrod commented 3 years ago

This is because I haven't transitioned Octavian from using ThreadingUtilities directly to using Polyester yet.

tlauli commented 3 years ago

Is it possible to work around this, or do I need to use matmul_serial in threaded code for now?

chriselrod commented 3 years ago

I can probably fix it in a few hours. Otherwise, yes, you'd need matmul_serial(!).

tlauli commented 3 years ago

The Polyester branch fixed the problem, thank you very much for the quick response and fix.