Closed droodman closed 10 months ago
I'm not sure this is a loss of precision that you could, or should, expect, maybe you should never since it seems arch-specific. At least this seems it should be a bug upstream at OpenBLAS.
I couldn't access the .zip file to test; in Linux. Most likely this is arch but not OS/platform specific.
You could try https://github.com/JuliaLinearAlgebra/BLISBLAS.jl
I would want to get rid of OpenBLAS (for other reasons). If BLISBLAS works then that could be a replacement with or without rather bundling it. You could also try, somehow, with the generic matmul that Julia has.
Sorry for the trouble with the link, whatever the cause. I will just upload it here: bugdemo.zip
Yes, switching to BLISBLAS also fixes it.
I wonder if this OpenBLAS issue is related.
Not reproduced on a Ryzen3 (AVX2), so probably a bug in the AVX512 (micro)kernel for GEMV
It also works fine on my nearly identical computer with an i7-10875H.
Hmm, very strange - I misread the model ID initially, that is a Raptor Lake without AVX512 so should use exactly the same AVX2 BLAS kernels as your i7-10875H (and my Ryzen3). And incidentally I cannot reproduce the problem on a "Skylake" Xeon with AVX512 either. Right now I think what could have happened is that your Raptor was not recognized by the old-ish OpenBLAS distributed with 1.9.4, and due to a recently resolved bug it was not even treated as "any" AVX2 cpu but dropped all the way down to ancient Pentium4 kernels from original GotoBLAS
Could you check detected CPU:
using LinearAlgebra
BLAS.openblas_get_config()
julia> using LinearAlgebra
julia> BLAS.openblas_get_config()
ERROR: UndefVarError: `openblas_get_config` not defined
Stacktrace:
[1] getproperty(x::Module, f::Symbol)
@ Base .\Base.jl:31
[2] top-level scope
@ REPL[2]:1
julia> BLAS.get_config()
LinearAlgebra.BLAS.LBTConfig
Libraries:
└ [ILP64] libopenblas64_.dll
Windows says: 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz
setting OPENBLAS_VERBOSE=2 in the environment before starting Julia should be at least as reliable - but i had already verified that the 0.3.21 distributed with Julia 1.9.4 could not identify a Raptor Lake cpu, and due to a logic flaw in its fallback code would use ancient SSE kernels for Pentium4/Core2 that indeed show the bug. What I have not been able to do so far is identify and fix the problem in that old SSE kernel - given that compilers have advanced a lot in the past 15 years and nothing should fall back to these ancient kernels anyway, I am tempted to simply replace it with the generic C implementation.
The old BLAS.openblas_get_config()
function can now be reproduced with the more verbose
julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall((BLAS.@blasfunc(openblas_get_config), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.25 USE64BITINT DYNAMIC_ARCH NO_AFFINITY armv8 MAX_THREADS=512"
This is probably not useful in this specific case, but I figured I'd drop the tip here for the record since it was mentioned above, in case someone else will need it again in the future.
Right now I think what could have happened is that your Raptor was not recognized by the old-ish OpenBLAS distributed with 1.9.4, and due to a recently resolved bug it was not even treated as "any" AVX2 cpu but dropped all the way down to ancient Pentium4 kernels from original GotoBLAS
@droodman would you be able to test with a nightly build of Julia which comes with OpenBLAS v0.3.25?
From Julia 1.9.4:
julia> strip(unsafe_string(ccall((BLAS.@blasfunc(openblas_get_config), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.21 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Prescott MAX_THREADS=512"
And yes it works on the nightly build:
julia> (_X'_y)[end], (_X[:,end]'_y)
(477.0, 477.0)
Prescott
this confirms Martin's suspicion that cpu was not detected and retiring buggy fallback routine is to fix it. you can set OPENBLAS_CORETYPE=HASWELL to use proper code path for your cpu in the meantime.
Can this be closed?
It is working for me now in Julia 1.10.0....
please confirm you have openblas 0.3.26 as in https://github.com/JuliaLang/LinearAlgebra.jl/issues/1041 0.3.25 has cpuid correct but faulty archaism still within reach for future cpus.
We are on 0.3.26 on Julia master.
Working with @matthieugomez on a problem I was having with FixedEffectModels.jl, I ran into what looks like OpenBLAS committing a matrix multiplication error in Windows on an Intel i9-13900H, as well as in Ubuntu 22.04 under Windows Subsystem for Linux on the same computer.
The error is deterministic--same every time. Setting the number of threads to 1 doesn't affect it. It goes away if I switch from OpenBLAS to MKL. It does not happen on a MacBook with an M2 Pro, nor a MacBook with an Intel i7-3820QM.
It involves computing
X'y
whereX
is a 8830997x41 Matrix{Float64} andy
is a 8830997-entry Vector{Float64}. All entries in both matrices are 0.0 or 1.0. In factX
is sparse: each row has at most one 1.0. At least the last entry of the result is wrong.A compressed .jld file with the two matrices stored in
Int16
is here. A demo script is below. The 6th and last lines compare two ways of computing the same thing; they should be equal but, after conversion fromInt16
toFloat64
, are not. Look at the 320.0 in the lower left.Screenshot of running this with and without
using MKL
, in Julia 1.9.4.