JuliaLinearAlgebra / Octavian.jl

Multi-threaded BLAS-like library that provides pure Julia matrix multiplication
https://julialinearalgebra.github.io/Octavian.jl/stable/
Other
230 stars 18 forks source link

Wrong matmul results. #129

Closed xijiang closed 2 years ago

xijiang commented 2 years ago
using LinearAlgebra, Octavian
a = rand(2, 3)
b = rand(3, 2)
matmul(a, b) # => 2nd row ~ [0, 0]
c = zeros(2, 2)
matmul!(c, a, b) # => c[2, :] ~[0, 0]

Julia Version 1.7.1 Commit ac5cc99908 (2021-12-22 19:35 UTC) Platform Info: OS: Linux (x86_64-redhat-linux) (fedora 35) CPU: AMD Ryzen 9 3900X 12-Core Processor WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-12.0.1 (ORCJIT, znver2) Environment: JULIA_NUM_THREADS = 12

The following configuration gave the right results:

Julia Version 1.6.5 Commit 9058264a69 (2021-12-19 12:30 UTC) Platform Info: OS: Linux (x86_64-redhat-linux) (centos7) CPU: Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

chriselrod commented 2 years ago

I don't have a znver2 CPU and cannot reproduce.

DilumAluthge commented 2 years ago

@xijiang Since you're on Linux, an rr trace might be useful.

chriselrod commented 2 years ago

Also gets the correct answer on zen3:

julia> matmul(a, b) # => 2nd row ~ [0, 0]
2×2 Matrix{Float64}:
 0.43163   0.468602
 0.856061  0.993642

julia> c = zeros(2, 2)
2×2 Matrix{Float64}:
 0.0  0.0
 0.0  0.0

julia> matmul!(c, a, b) # => c[2, :] ~[0, 0]
2×2 Matrix{Float64}:
 0.43163   0.468602
 0.856061  0.993642

julia> versioninfo()
Julia Version 1.8.0-DEV.1184
Commit 722f9d4958 (2021-12-28 14:28 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: AMD EPYC 7513 32-Core Processo
DilumAluthge commented 2 years ago

@xijiang I should clarify- what I meant was for you to run rr and then upload the trace files somewhere so that Chris could download them and take a look himself.

You could do that manually. Alternatively, this script will automate the process of creating a compressed archive containing all the rr trace files: https://github.com/JuliaCI/julia-buildkite/blob/main/utilities/rr/rr_capture.jl

Permalink to the script: https://github.com/JuliaCI/julia-buildkite/blob/f88013a1af85090e1c0715bc146807ebde2a7178/utilities/rr/rr_capture.jl

xijiang commented 2 years ago

Repeated on two different AMD machines. Both run F35. Three Julia versions compiled from source.

Version Status
v1.6.5 OK
v1.8.0-dev.1227 OK
v1.7.1 not OK

The is similar to the package from Fedora repo.

Ps. I removed my last post while I editing this one. @chriselrod can you please compile a v1.7.1 on your machine?

Maybe it is a julia problem itself.

Pps. v1.7.1 is OK on i5-6260u.

DilumAluthge commented 2 years ago

Can you reproduce using the official binaries from julialang.org? (The binaries from the Fedora repo are not the official binaries.)

xijiang commented 2 years ago

The binary from julialang.org has the same problem. also not OK.

DilumAluthge commented 2 years ago

Can you run an rr trace and send the trace files to Chris, maybe through Slack? The files will be too big to upload to GitHub.

xijiang commented 2 years ago

Can you run an rr trace and send the trace files to Chris, maybe through Slack? The files will be too big to upload to GitHub.

I wrote a script:

#!/usr/bin/env julia

using Octavian
a = rand(2, 3)
b = rand(3, 2)
@info "matmul" matmul(a, b)

c = zeros(2, 2)
matmul!(c, a, b)
@info "matmul!" c

The I ran

julia rr_capture.jl tst-octavian.jl

But not successful. More instructions, please?

DilumAluthge commented 2 years ago
export JULIA_ALWAYS_SAVE_RR_TRACE="true"

julia rr_capture.jl julia tst-octavian.jl
xijiang commented 2 years ago

On Zen CPUs, rr will not work reliably unless you disable the hardware SpecLockMap optimization. For instructions on how to do this, see https://github.com/rr-debugger/rr/wiki/Zen rr: Saving execution to trace directory `/home/xijiang/Music/tst-octavian/temp_for_rr/jl_VqocYV/rr_traces/octavian.jl-0'.

Still no luck. I will have a look above.

DilumAluthge commented 2 years ago

Oh, yeah, you'll need to do the Zen workaround.

I don't know the details. But I think @vchuravy has done those steps before.

DilumAluthge commented 2 years ago

After you do the Zen workaround, if it still doesn't work, can you post the exact commands you ran, along with the full log?

xijiang commented 2 years ago

after workaround with the python codes, I need to

chmod u+x tst.jl

I have made a copy @ http://nmbu.org/tmp/rr--build_----commit_--2022_01_06_21_45_16.tar.zst

Please tell me if you need more.

DilumAluthge commented 2 years ago

@chriselrod See above for rr trace.

chriselrod commented 2 years ago

@xijiang what version of VectorizationBase do you on each system, especially the one getting an incorrect answer?

I believe you have VectorizationBase 0.19 on the system where it is failing. It should work if you upgrade to a more recent version, e.g. VectorizationBase 0.21.

xijiang commented 2 years ago

Actually because of Turing, which prevent upgrading, my VectorizationBase was 0.16. 2. After removing Turing, I was able to upgrade VeroizationBase to 0.21.23. This problem was solved. Many thanks. Obviously there is a package management issue. I may shoot a question on discourse.