JuliaLang / LinearAlgebra.jl

Julia Linear Algebra standard library
Other
17 stars 4 forks source link

Matrix multiplication error in OpenBLAS in Windows?! #1041

Closed droodman closed 10 months ago

droodman commented 12 months ago

Working with @matthieugomez on a problem I was having with FixedEffectModels.jl, I ran into what looks like OpenBLAS committing a matrix multiplication error in Windows on an Intel i9-13900H, as well as in Ubuntu 22.04 under Windows Subsystem for Linux on the same computer.

The error is deterministic--same every time. Setting the number of threads to 1 doesn't affect it. It goes away if I switch from OpenBLAS to MKL. It does not happen on a MacBook with an M2 Pro, nor a MacBook with an Intel i7-3820QM.

It involves computing X'y where X is a 8830997x41 Matrix{Float64} and y is a 8830997-entry Vector{Float64}. All entries in both matrices are 0.0 or 1.0. In fact X is sparse: each row has at most one 1.0. At least the last entry of the result is wrong.

A compressed .jld file with the two matrices stored in Int16 is here. A demo script is below. The 6th and last lines compare two ways of computing the same thing; they should be equal but, after conversion from Int16 to Float64, are not. Look at the 320.0 in the lower left.

# using MKL
using JLD
@load  "bugdemo.jld"
typeof(X), typeof(y)
size(X), size(y)
(X'y)[end], (X[:,end]'y)
_X=convert(Matrix{Float64}, X);
_y=convert(Vector{Float64}, y);
(_X'_y)[end], (_X[:,end]'_y)

Screenshot of running this with and without using MKL, in Julia 1.9.4.

image

PallHaraldsson commented 12 months ago

I'm not sure this is a loss of precision that you could, or should, expect, maybe you should never since it seems arch-specific. At least this seems it should be a bug upstream at OpenBLAS.

I couldn't access the .zip file to test; in Linux. Most likely this is arch but not OS/platform specific.

You could try https://github.com/JuliaLinearAlgebra/BLISBLAS.jl

I would want to get rid of OpenBLAS (for other reasons). If BLISBLAS works then that could be a replacement with or without rather bundling it. You could also try, somehow, with the generic matmul that Julia has.

droodman commented 12 months ago

Sorry for the trouble with the link, whatever the cause. I will just upload it here: bugdemo.zip

Yes, switching to BLISBLAS also fixes it.

I wonder if this OpenBLAS issue is related.

martin-frbg commented 12 months ago

Not reproduced on a Ryzen3 (AVX2), so probably a bug in the AVX512 (micro)kernel for GEMV

droodman commented 12 months ago

It also works fine on my nearly identical computer with an i7-10875H.

martin-frbg commented 12 months ago

Hmm, very strange - I misread the model ID initially, that is a Raptor Lake without AVX512 so should use exactly the same AVX2 BLAS kernels as your i7-10875H (and my Ryzen3). And incidentally I cannot reproduce the problem on a "Skylake" Xeon with AVX512 either. Right now I think what could have happened is that your Raptor was not recognized by the old-ish OpenBLAS distributed with 1.9.4, and due to a recently resolved bug it was not even treated as "any" AVX2 cpu but dropped all the way down to ancient Pentium4 kernels from original GotoBLAS

brada4 commented 11 months ago

Could you check detected CPU: using LinearAlgebra BLAS.openblas_get_config()

droodman commented 11 months ago
julia> using LinearAlgebra

julia> BLAS.openblas_get_config()
ERROR: UndefVarError: `openblas_get_config` not defined
Stacktrace:
 [1] getproperty(x::Module, f::Symbol)
   @ Base .\Base.jl:31
 [2] top-level scope
   @ REPL[2]:1

julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.dll

Windows says: 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz

martin-frbg commented 11 months ago

setting OPENBLAS_VERBOSE=2 in the environment before starting Julia should be at least as reliable - but i had already verified that the 0.3.21 distributed with Julia 1.9.4 could not identify a Raptor Lake cpu, and due to a logic flaw in its fallback code would use ancient SSE kernels for Pentium4/Core2 that indeed show the bug. What I have not been able to do so far is identify and fix the problem in that old SSE kernel - given that compilers have advanced a lot in the past 15 years and nothing should fall back to these ancient kernels anyway, I am tempted to simply replace it with the generic C implementation.

giordano commented 11 months ago

The old BLAS.openblas_get_config() function can now be reproduced with the more verbose

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall((BLAS.@blasfunc(openblas_get_config), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.25  USE64BITINT DYNAMIC_ARCH NO_AFFINITY armv8 MAX_THREADS=512"

This is probably not useful in this specific case, but I figured I'd drop the tip here for the record since it was mentioned above, in case someone else will need it again in the future.

giordano commented 11 months ago

Right now I think what could have happened is that your Raptor was not recognized by the old-ish OpenBLAS distributed with 1.9.4, and due to a recently resolved bug it was not even treated as "any" AVX2 cpu but dropped all the way down to ancient Pentium4 kernels from original GotoBLAS

@droodman would you be able to test with a nightly build of Julia which comes with OpenBLAS v0.3.25?

droodman commented 11 months ago

From Julia 1.9.4:

julia> strip(unsafe_string(ccall((BLAS.@blasfunc(openblas_get_config), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.21  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott MAX_THREADS=512"

And yes it works on the nightly build:

julia> (_X'_y)[end], (_X[:,end]'_y)
(477.0, 477.0)
brada4 commented 11 months ago

Prescott

this confirms Martin's suspicion that cpu was not detected and retiring buggy fallback routine is to fix it. you can set OPENBLAS_CORETYPE=HASWELL to use proper code path for your cpu in the meantime.

ViralBShah commented 10 months ago

Can this be closed?

droodman commented 10 months ago

It is working for me now in Julia 1.10.0....

brada4 commented 10 months ago

please confirm you have openblas 0.3.26 as in https://github.com/JuliaLang/LinearAlgebra.jl/issues/1041 0.3.25 has cpuid correct but faulty archaism still within reach for future cpus.

ViralBShah commented 10 months ago

We are on 0.3.26 on Julia master.