Open jishnub opened 1 month ago
The failure reported above is reproducible for me on both aarch64-darwin
and aarch64-linux
with Base.runtests("LinearAlgebra/lapack"; seed=0xf3fab14bb37f52ec47cdcf1f85e908bd)
, I guess it depends on the OpenBLAS kernel used.
Minimal reproducer should be
using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))
which fails for me also on x86_64-linux
. BTW, the testset does fail for me on
julia> versioninfo()
Julia Version 1.12.0-DEV.1431
Commit fc40e629b1d (2024-10-18 19:33 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 384 × AMD EPYC 9654 96-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)
This system is using Cooperlake
OpenBLAS core type:
julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Cooperlake MAX_THREADS=512"
For what is worth:
using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
c1 = triu(inv(A));
c2 = triu(LAPACK.sytri!('U',B,ipiv));
@show norm(c1 - c2) / max(norm(c1), norm(c2))
@show eps(cond(A))
results in
norm(c1 - c2) / max(norm(c1), norm(c2)) = 2.119341f-6
eps(cond(A)) = 1.9073486f-6
Locally, I obtain
julia> using LinearAlgebra
julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!
julia> B = copy(A);
julia> B,ipiv = LAPACK.sytrf!('U',B);
julia> isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))
true
norm(c1 - c2) / max(norm(c1), norm(c2)) = 8.0597636f-7
eps(cond(A)) = 1.9073486f-6
julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"
julia> versioninfo()
Julia Version 1.12.0-DEV.1438
Commit e08280a24fb (2024-10-20 01:04 UTC)
Build Info:
Official https://julialang.org release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
WORD_SIZE: 64
LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
JULIA_EDITOR = subl
Adding a few top level tests, which basically make the same LAPACK calls:
julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.27 USE64BITINT DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=512"
julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!
julia> B = bunchkaufman(A);
julia> R = bunchkaufman(A,true); # rook pivoting
julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
false
julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true
julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true
Yes, the test passes for me on Haswell:
julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"
julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!
julia> B = bunchkaufman(A);
julia> R = bunchkaufman(A,true); # rook pivoting
julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
true
julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true
julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true
Yes, the test passes for me on Haswell:
Thank you very much for the confirmation! Could you kindly compare your outputs to mine in the JLD2 file below?
The JLD2 file was generated in Julia as follows:
julia> using JLD2
julia> save("56255-1.jld2",Dict("A" => A, "B" => B, "R" => R, "inv(B)" => inv(B), "inv(A)" => inv(A), "inv(R)" => inv(R),"cond(A)" => cond(A), "eps(cond(A))" => eps(cond(A))))
And its contents can be inspected as follows:
julia> using JLD2
julia> L = load("56255-1.jld2")
Dict{String, Any} with 8 entries:
"cond(A)" => 31.674
"B" => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[0.858494 -0.220169 … 0.218334 0.84004…
"A" => Float32[1.03302 1.13542 … 1.28012 1.48514; 1.13542 0.437221 … 0.911591 0.804944; … ; 1.28012 0.9115…
"inv(R)" => Float32[1.16483 0.256459 … -0.827415 0.334034; 0.256459 -0.671429 … 0.51427 -0.288833; … ; -0.82741…
"inv(B)" => Float32[1.16483 0.25646 … -0.827415 0.334034; 0.25646 -0.671429 … 0.51427 -0.288833; … ; -0.827415 …
"inv(A)" => Float32[1.16483 0.256459 … -0.827415 0.334033; 0.256459 -0.671429 … 0.51427 -0.288833; … ; -0.82741…
"R" => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[1.92042 0.663282 … 1.42713 0.84004; 1…
"eps(cond(A))" => 1.90735f-6
julia> L["cond(A)"]
31.67395f0
Although in my case isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
is true
, I suspect that the problem might lie in B
already, i.e., SSYTRF
. Will be interesting to know (+ I've never looked at an S- version in detail so far).
Here's what I obtain:
Here's what I obtain:
Comparison between matrices
Thank you very much again, @jishnub. Please see below for some general points and some points on your above comparison.
(X)GELSD/Y
.Regarding your comparison, I see (analysis below) that when the deltas to BunchKaufman's D
and U
are applied on my PC, the inversion seems to satisfy the current rtol=eps(cond(A))
, indicating that the issue might lie in (X)SYTRF
already. NOTE that there are no tests in test/lapack.jl
presently for (X)SYTRF
. Tests are only for the combination with (X)SYTRI
.
Outcome:
lapack.jl
and SSYTRF
to isolate the problem or recommend increasing the rtol
value analogous to (X)GELSD/Y
.isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
is true
, the tolerance is likely too tight.rtol
bound for Bunch-Kaufman decomposition-based inversion and utilizing it to revise the current rtol
is perhaps the best way to go.Extending the analysis in [1] given below for inversion, it seems that the required rtol
can be significantly higher than eps(cond(A))
.
The following script (for Gaussian matrices) seems to agree with this.
function worstN(N)
R = zeros(N,1)
E = zeros(N,1)
B = zeros(Bool,N,1)
for i in 1:N
A = (X->(X+X')/2)(Float32.(randn(4,4)))
A1 = inv(A)
A2 = inv(bunchkaufman(A))
R[i] = norm(A1-A2)/max(norm(A1), norm(A2))
E[i] = eps(cond(A))
B[i] = isapprox(A1, A2; rtol=eps(cond(A)))
end
maximum(R./E)
end
for i = 1:100 display(worstN(1000000)); end
Note: Also applies to Float64
.
[1] | N. J. Higham, "Stability of the Diagonal Pivoting Method with Partial Pivoting," Report no. 265, https://www.netlib.org/lapack/lawnspdf/lawn105.pdf |
In the
aarch64-apple-darwin
job https://buildkite.com/julialang/julia-master/builds/41270#0192a51c-3afb-4a6f-a53f-3c3b9f537ed4, there isUnfortunately, I am not being able to replicate it locally using the same seed but on a different platform (x86_64-linux-gnu). The issue seems to be related to floating-point accuracy.