Test failure in `LinearAlgebra/lapack.jl` in comparing `LAPACK.sytri!` to `inv` for an almost-symmetric matrix

jishnub commented 2 weeks ago

In the aarch64-apple-darwin job https://buildkite.com/julialang/julia-master/builds/41270#0192a51c-3afb-4a6f-a53f-3c3b9f537ed4, there is

The global RNG seed was 0xf3fab14bb37f52ec47cdcf1f85e908bd.
Error in testset LinearAlgebra/lapack:
Test Failed at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-d3cd3f4a3a/share/julia/stdlib/v1.12/LinearAlgebra/test/lapack.jl:441
  Expression: ≈(triu(inv(A)), triu(LAPACK.sytri!('U', B, ipiv)), rtol = eps(cond(A)))
   Evaluated: Float32[1.1648309 0.25645927 … -0.82741505 0.3340336; 0.0 -0.6714291 … 0.5142702 -0.2888328; … ; 0.0 0.0 … 0.086971946 0.14950116; 0.0 0.0 … 0.0 0.38216618] ≈ Float32[1.164831 0.25645974 … -0.82741505 0.3340336; 0.0 -0.6714287 … 0.5142697 -0.2888328; … ; 0.0 0.0 … 0.08697438 0.1495005; 0.0 0.0 … 0.0 0.3821662] (rtol=1.9073486e-6)
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/julia/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-honeycrisp-HL2F7YQ3XH.0/build/default-honeycrisp-HL2F7YQ3XH-0/julialang/julia-master/julia-d3cd3f4a3a/share/julia/test/runtests.jl:100
ERROR: A test has failed. Please submit a bug report (https://github.com/JuliaLang/julia/issues)
including error messages above and the output of versioninfo():
Julia Version 1.12.0-DEV.1426
Commit d3cd3f4a3aa (2024-10-19 14:08 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: macOS (arm64-apple-darwin24.0.0)
  CPU: 8 × Apple M2
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, apple-m2)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)
Environment:
  JULIA_INSTALL_DIR = julia-d3cd3f4a3a
  JULIA_SHELL = /bin/bash
  JULIA_CPU_TARGET = generic;apple-m1,clone_all
  JULIA_TEST_MAXRSS_MB = 3800
  JULIA_CMD_FOR_TESTS = julia-d3cd3f4a3a/bin/julia .buildkite/utilities/timeout.jl julia-d3cd3f4a3a/bin/julia
  JULIA_TEST_VERBOSE_LOGS_DIR = /private/var/tmp/agent-tempdirs/default-honeycrisp-HL2F7YQ3XH.0/tmp/jl_JXug2a
  JULIA_IMAGE_THREADS = 4
  JULIA_BINARYDIST_FILENAME = julia-d3cd3f4a3a-macaarch64
  JULIA_CPU_THREADS = 4
  JULIA_NUM_THREADS = 1
  JULIA_VERSION = 1.12.0-DEV
  JULIA_TEST_IS_BASE_CI = true

Unfortunately, I am not being able to replicate it locally using the same seed but on a different platform (x86_64-linux-gnu). The issue seems to be related to floating-point accuracy.

giordano commented 2 weeks ago

The failure reported above is reproducible for me on both aarch64-darwin and aarch64-linux with Base.runtests("LinearAlgebra/lapack"; seed=0xf3fab14bb37f52ec47cdcf1f85e908bd), I guess it depends on the OpenBLAS kernel used.

giordano commented 2 weeks ago

Minimal reproducer should be

using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))

which fails for me also on x86_64-linux. BTW, the testset does fail for me on

julia> versioninfo()
Julia Version 1.12.0-DEV.1431
Commit fc40e629b1d (2024-10-18 19:33 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 384 × AMD EPYC 9654 96-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, znver4)
Threads: 1 default, 0 interactive, 1 GC (on 384 virtual cores)

This system is using Cooperlake OpenBLAS core type:

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Cooperlake MAX_THREADS=512"

For what is worth:

using LinearAlgebra
A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
A = A + transpose(A); #symmetric!
B = copy(A);
B,ipiv = LAPACK.sytrf!('U',B);
c1 = triu(inv(A));
c2 = triu(LAPACK.sytri!('U',B,ipiv));
@show norm(c1 - c2) / max(norm(c1), norm(c2))
@show eps(cond(A))

results in

norm(c1 - c2) / max(norm(c1), norm(c2)) = 2.119341f-6
eps(cond(A)) = 1.9073486f-6

jishnub commented 2 weeks ago

Locally, I obtain

julia> using LinearAlgebra

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];

julia> A = A + transpose(A); #symmetric!

julia> B = copy(A);

julia> B,ipiv = LAPACK.sytrf!('U',B);

julia> isapprox(triu(inv(A)), triu(LAPACK.sytri!('U',B,ipiv)); rtol=eps(cond(A)))
true

norm(c1 - c2) / max(norm(c1), norm(c2)) = 8.0597636f-7
eps(cond(A)) = 1.9073486f-6

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"

julia> versioninfo()
Julia Version 1.12.0-DEV.1438
Commit e08280a24fb (2024-10-20 01:04 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = subl

aravindh-krishnamoorthy commented 2 weeks ago

Adding a few top level tests, which basically make the same LAPACK calls:

julia> using OpenBLAS_jll, LinearAlgebra
julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.27  USE64BITINT DYNAMIC_ARCH NO_AFFINITY SkylakeX MAX_THREADS=512"

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];
julia> A = A + transpose(A); #symmetric!
julia> B = bunchkaufman(A);
julia> R = bunchkaufman(A,true); # rook pivoting

julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
false
julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true
julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true

Btw, in which version / configuration does this test pass? Haswell?

jishnub commented 1 week ago

Yes, the test passes for me on Haswell:

julia> using OpenBLAS_jll, LinearAlgebra

julia> strip(unsafe_string(ccall(((:openblas_get_config64_), libopenblas), Ptr{UInt8}, () )))
"OpenBLAS 0.3.28  USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=512"

julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705];

julia> A = A + transpose(A); #symmetric!

julia> B = bunchkaufman(A);

julia> R = bunchkaufman(A,true); # rook pivoting

julia> isapprox(inv(B), inv(A); rtol=eps(cond(A)))
true

julia> isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A)))
true

julia> isapprox(inv(R), inv(A); rtol=eps(cond(A)))
true

aravindh-krishnamoorthy commented 1 week ago

Yes, the test passes for me on Haswell:

Thank you very much for the confirmation! Could you kindly compare your outputs to mine in the JLD2 file below?

The JLD2 file was generated in Julia as follows:

julia> using JLD2
julia> save("56255-1.jld2",Dict("A" => A, "B" => B, "R" => R, "inv(B)" => inv(B), "inv(A)" => inv(A), "inv(R)" => inv(R),"cond(A)" => cond(A), "eps(cond(A))" => eps(cond(A))))

And its contents can be inspected as follows:

julia> using JLD2
julia> L = load("56255-1.jld2")
Dict{String, Any} with 8 entries:
  "cond(A)"      => 31.674
  "B"            => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[0.858494 -0.220169 … 0.218334 0.84004…
  "A"            => Float32[1.03302 1.13542 … 1.28012 1.48514; 1.13542 0.437221 … 0.911591 0.804944; … ; 1.28012 0.9115…
  "inv(R)"       => Float32[1.16483 0.256459 … -0.827415 0.334034; 0.256459 -0.671429 … 0.51427 -0.288833; … ; -0.82741…
  "inv(B)"       => Float32[1.16483 0.25646 … -0.827415 0.334034; 0.25646 -0.671429 … 0.51427 -0.288833; … ; -0.827415 …
  "inv(A)"       => Float32[1.16483 0.256459 … -0.827415 0.334033; 0.256459 -0.671429 … 0.51427 -0.288833; … ; -0.82741…
  "R"            => BunchKaufman{Float32, Matrix{Float32}, Vector{Int64}}(Float32[1.92042 0.663282 … 1.42713 0.84004; 1…
  "eps(cond(A))" => 1.90735f-6

julia> L["cond(A)"]
31.67395f0

Although in my case isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A))) is true, I suspect that the problem might lie in B already, i.e., SSYTRF. Will be interesting to know (+ I've never looked at an S- version in detail so far).

JLD2 File

- Copy the below text to clipboard. - Issue `$ uudecode | bunzip2 > 56255-1.jld2` - Paste the clipboard contents into the console. - Load the `.jld2` file as indicated above. ```verbatim begin-base64 644 - QlpoOTFBWSZTWVWJmgEABwj/////////////////////////v//f///7//// //3//+//4AmW+AAAFVxbePZR3s3FuXZ7bnIanhoiAptEyNNU9PREeUb0U9Am 1MJkw1HijI9T1PUG0nqbUaeRPUeoD0gxM1Hqfqaj0aaEGxqnpD0ZT1PKbU0G 1Bp5Ro2psp6EekbEmnpPJBjKeUNNPSeo8UGkppoFP1R6g8mjKflBPU009NR+ pMj1NAG1ANND1DQPUAAejUNDTQGgA0A9QNAGQAAANANGgAANAPUBoDQ0BoBq aUGGlMaKbCmh6jajTQepkHqNB6R6j1AAGhoADQABoADQAAAAAAAA0GgAAGgA AAAAAAGpoEKT8pHlPUNDGppo0ND1AA0ADQAA0AAAAGg0AADQAGQAPUAD1AAA AAAAAAAA0AEAAAAAAAAAAAAAAAAAAAAAaaAAAAAAAAAAAAAAAAAAAAaBJEIR kRo0lPTek2oU3qbJT9U080k9GpoGelPRtU/TVMT9SehGmTIPFNpP1T1HqNNl PUaYQbSHqeUGgeoAaPU0yDQwIYRpkGyT1HqAyGmgA9I8o0izhLSh6C7BrpFk NrY1gICwGCoAP+DzWJCTVImKVMJeoN6s3iW0OmpGkqxB4RIKflWyAPwYCjRa QbRy8Bk3TaNzzxqMaHxMBPmGCfgCBjZ/IorScIzCoxc2czjBg312gnx8eFrA c4xTNO0uepbTPqOReBpmKARPGAX9bh8XgX8xE82wLkIGexGZwEYPH7OBYZIF hMAi9AZAwyQZPlrlXIxkkZQdUBRdjnKs7hTSSFqGFPd0F5POw1pqSZh6K4zY 1WJM8khFS1IYt1YcgRNOIdbDrXVo1yegTJAi8EDAhUEjjDHCwBJIkhgEGHO4 cddcBXcCnCtyCi7odDYwNHeXM/n5qUphhMUicALINnAGSkyAZhIC+TIoEyEI p1QDgDwTgk5O5rkzMjkmV1ZxNUVVSJ2BK43LteDBjbd1Y02RKwBSBMkF6MwY 2NBV2GaXNVp7QSZJVE0IpnGURsQnhBAdAwIxAC9M974UYbGcWnIkMNPvWDQC GIIYGTIUyaBBwBxMIvGEwwk1A7auaIXSVRM8mEQGGOLEFyERMbASGQMyBTsk v6Xr72cHRWQMBV9p1AK7v47oHEyQtaySBxkhhYSc3eD9IUAV8wbqs4iAUV4C gIdxCjTVNKWoD1h6c2BkG2FNqtC2aoGUGNywNYvrwbioEGnYGHciFr7fadv2 +ug/Wbn3LdY9ytUVfg778LbsOFa4dtFsruHILERCBU2CBNBRiCYgG+CK+7Lw RZJflitWoJyEBA8UdKpERMgYlJM4nWMZmt5XaTlC+ryRYLPfilNU5gELIRD/ uOfnv9HJWLf7CFTRemHMJsf54t/1fCD/AautRijBUwmzrOhmRHAeHzL94F6a t5uUGd3OVZ4Qa/ic/TEKATB+ouBhhDzNktMbXITdCU1C8wYBbiWg0QFqklMz p/H7TqSi1vNSmY9dy/OmqRmgMcAM2hWYGO1IDRSa7Ps8Slppm7fglvGR1IMG QlAZRUTcDfc1WAtMBwQ4mVr/FuYM4B6nprJxPo2OcFfSNzrN34eWF8uu2uSA IjHMYQF2AmVVUwL3GZs+KYX8JJIMtN3NN6UxHy0WAFx4ZXZChecpeIuILoD8 n18Oxc935rD35XqDb/jDyHSi6MPDySfCCv/GAqpjIuHcXvmwGgpKDj3aOzRV gtw02hdnzLCR8FaJzDCamZd7sjJ8NsMufYgbaQr4NEqKBO02SkJ/wuIXtww8 bv/XBoqy6vNrsxgHBRJDTXhxq1r9ot0bqV+9sH0iM1lBImArFvfEDoEoEn10 PTlR9xUHi2xm+YG9dKuHPNY0D6WQsdeVC2T9XpChBTxVd82ij6cIxI1wxOOC alnndhcWLdVlMb8gRSNjr+JyL5QgU2ZbgAnDqKepQ0zMXbF0EXkDNNUxyBa/ LwGzJV5404XkAGXDK97onplC2SoVMwZYAdpFGokPKHUamShaci1kFp7Zd1Ea ZP8NYUMqIom6MeZwgdvULQIs8UPvqTPuMHGuIIt+a9V4J055GDQXseMTI0Qp obGHewiP3MyWg6/yIYPEPUIEQ9DpfZCG3YwkE2f6PH8sUG1qfwkSR21oQSZO FOThfGYhxi2I66jKVDmqK4UuBMY1flaKqQqS9Jbw9GGJk+6UI8duse/j4ygP mX8ZHTmHJ8UAvojBeSvHDC/ELbuIeQLRQhdbOe3tLOUNwYda4sgWtv9G492y mJh3y2pbnSGor/pd9EqKJs2JhX9yPSDIRpP4p63RXR+kJNhIUZX33F32rKoQ G5qtrwKEzOijDIx6vc6CGbE0QJ01FKRLCRxQWk83aoFQTNzF0aSUHNKZJyKY uiVE+FKUyH4IerlYU3Jko6MJd8PqlCyaaeOXJvswXll5oI6HRz+Xx5eZSimH 5IyhWYEAF60H6AlXIX6CqvnAXW/qq69WjAVLYjlEAU4qspr3I5A7JZdz+ee1 VMREOk7mRoAIJMCIa1wDAYtUY8mzIEPCsotMOoE18CmeEyTwEfdxhy+FGax4 31+bRrII/LpbvR0lr+OhVMWCkDAS6IsPRE30DVIQQxF+JNHuRNpUkYulhRzI drx8eF2YtLZbAkbzZlgM6Fo2Xivgw4ZSsA1Sf5vwUwzc81chE5ip3NuVvwQW IV0Zky1QRWttwFJmaz/UxZHxYgFworb0aE6eyAhPJHxzLp3hQFVikwvkpbvO JNqcrOi5YECDyogsQDOGolKTwyIfhRxSvOpXlmQfMX4l9c8dkbuuQZ6GqMo7 ciC9mULEYktWA2GaQH3T4EOyNJMNozeyMmcJsBaQaChGFd06NJE46KZiEQ85 5NAr3h7t+HkWastxMQxBeJ2QoWUC5KIGdE84mD3nhnVGfDSfRnGCkOC1piu9 5TbQBjVMKPItCxMQQTBI1IDrIQrou8xgwSAYhrKdx1kjmE6eWo9FKJDoGhAq I2cBoOiAkcP1lnQF8iEkIrEBEmiKgsgTxAkTDBRpDebCTPoINqWtjMxhQWcF aU28eQANLY7uXQbwPC2OOe5wrZrBh42GCRVp78rGSZekRtK2jrg6sQ/5k/Qd /WxJp52+Rb8I3HUfZ/S7HfUHwOjqPBAzMLI0gbL4SYxU7vQFLc4t+wdOHs5F rrwKBkgzE1YJhS7BGq2FZOtKgNZOd+jqvh1N8w3tkB9yz6izF3ngH0ezcgcJ q9ApKPx8CQJm0irpM2y0y5jtQdPAPl4SY+V7eh/T1XrC6IXuA78Mpjpv50I8 LMXyRfKDIsYJp9Y686V9WGi2/Djnj2ux8vYzeXygoUbJvvAhK9J+CFHZehlB E7Ku/1g2YRDYzbb+0Z2icpFDW284wZEyzJxxE+PH1JUF3gJC8kaD7AcBA4Hr ZmrR4oYkid21YHAKbcmIxSA+b5FWAtqouQMaWzdN0VGsYMFQEQmL2U0CsoV7 qqtXbJvpoRgxg5ZVqO5FNtQofvA63mODGoPJCjzbKPNh3f6N6c8G6x6EwhsS CRpsHPhSXVR/WVFFH+xPNareU36YwahEpvo3Txmd98cyYuNYhFyCVLWZq0xc RJ1IL+F7qMrN937+rjnObBz4QhVJpkrFgjI2poQbYF/Lc1kRyowncCSOcE6U CIniRQThgiJx2FS0MpIjtXUn1TFABXE1fMuM1fzKBcJ7aDEPb1FmgBIhq10z pSzyAQKNj1Otl/XYhgaoQqHhfJe4ifnwncToNejISamyAUAv67M4a47TArXc BBwMVZwHNNwNKlHJLdL1Tj1UuRkjeEeZtEDwlcRK9rKVgLaye5bsGMG0ZRPJ Vei4eAMbVXirfy6r0+eAtBPEFgRVPFYpCkPjOnE5wqeq4ZpWAlTosbYP9CjY ohnjFHNZYBJCfkhogIUCy57bBEaDrXHa6s6WAZmIO8f4EROiyH5g9vRxuY/8 AYyxdEGX2X3NxbNeFJZGr+NuAfYazut16/eAszhv48+eKF6tuJvm3EuTUMU6 gmOchHWZYBZJqAGho/2EDHLAcYDHesbnbbUVOpKGG1RwD1CbRb0NfmBrQ1Qa eq0K0HLZu3hc2rnrg+wF718ac/RtAaIWfSnA/2dTL2n9aTtWKuZwklP0DA0k gGehtnfUtqUrVjknIsIa37MY0P3qrj/i7kinChIKsTNAIA== ==== ```

jishnub commented 1 week ago

Here's what I obtain:

Comparison between matrices

```julia julia> L["cond(A)"] == cond(A) true julia> L["A"] == A true julia> L["inv(A)"] == inv(A) false julia> L["inv(A)"] - inv(A) 10×10 Matrix{Float32}: -1.19209f-7 -2.38419f-7 1.78814f-7 -1.78814f-7 5.96046f-8 -2.68221f-7 -1.19209f-7 2.98023f-7 4.17233f-7 0.0 -1.19209f-7 1.78814f-7 1.19209f-7 0.0 -1.78814f-7 1.49012f-7 -2.98023f-8 -2.08616f-7 -5.96046f-8 2.98023f-8 1.19209f-7 1.19209f-7 -2.08616f-7 2.98023f-8 5.96046f-8 1.71363f-7 4.47035f-8 -1.19209f-7 -1.19209f-7 -1.3411f-7 -2.98023f-8 -1.78814f-7 2.98023f-8 1.19209f-7 5.96046f-8 -5.96046f-8 -5.96046f-8 5.96046f-8 -5.30854f-8 2.98023f-8 5.96046f-8 -1.78814f-7 2.98023f-8 2.98023f-8 5.96046f-8 -2.38419f-7 -5.96046f-8 1.93715f-7 1.19209f-7 5.96046f-8 -7.45058f-9 1.19209f-7 0.0 -5.96046f-8 -5.96046f-8 1.19209f-7 2.98023f-8 -9.68575f-8 -5.96046f-8 0.0 5.96046f-8 8.9407f-8 -7.45058f-8 8.9407f-8 0.0 -2.98023f-8 0.0 -8.9407f-8 -1.19209f-7 8.9407f-8 5.96046f-8 -1.19209f-7 -5.96046f-8 5.96046f-8 2.98023f-8 -2.23517f-8 -2.98023f-8 5.02914f-8 2.98023f-8 4.47035f-8 1.19209f-7 5.96046f-8 -1.78814f-7 -4.93601f-8 1.19209f-7 2.98023f-7 5.96046f-8 -8.9407f-8 -1.49012f-7 -2.08616f-7 -5.96046f-8 -5.96046f-8 2.98023f-8 0.0 -1.49012f-8 5.96046f-8 -4.47035f-8 1.49012f-8 5.96046f-8 0.0 julia> L["B"].U - B.U 10×10 Matrix{Float32}: 0.0 -1.78814f-7 2.98023f-7 0.0 -1.19209f-7 -2.38419f-7 -1.49012f-7 -2.98023f-7 4.47035f-8 0.0 0.0 0.0 -1.19209f-7 2.98023f-8 1.78814f-7 1.19209f-7 1.19209f-7 1.19209f-7 1.49012f-8 0.0 0.0 0.0 0.0 5.96046f-8 5.96046f-8 0.0 -1.19209f-7 -1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 -2.38419f-7 -1.19209f-7 0.0 -1.19209f-7 -5.96046f-8 0.0 0.0 0.0 0.0 0.0 0.0 7.17118f-8 -5.96046f-8 -2.98023f-8 2.98023f-8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.19209f-7 -1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 julia> L["B"].D - B.D 10×10 Tridiagonal{Float32, Vector{Float32}}: 3.57628f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -4.76837f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 9.53674f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -5.96046f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 4.76837f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 5.96046f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -5.96046f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 julia> L["inv(B)"] - inv(B) 10×10 Matrix{Float32}: -4.76837f-7 1.19209f-7 8.34465f-7 -5.96046f-8 -7.7486f-7 7.45058f-9 5.36442f-7 -1.04308f-6 1.13249f-6 -7.15256f-7 1.19209f-7 3.57628f-7 -2.08616f-7 0.0 -1.78814f-7 4.47035f-8 8.9407f-8 -2.98023f-8 -2.38419f-7 5.96046f-8 8.34465f-7 -2.08616f-7 1.66893f-6 2.98023f-8 -6.55651f-7 -9.08971f-7 -2.38419f-7 -2.92063f-6 1.84774f-6 -6.25849f-7 -5.96046f-8 0.0 2.98023f-8 5.96046f-8 5.96046f-8 -5.96046f-8 -2.98023f-8 -1.49012f-7 1.46218f-7 5.96046f-8 -7.7486f-7 -1.78814f-7 -6.55651f-7 5.96046f-8 5.36442f-7 4.76837f-7 2.98023f-7 1.72853f-6 -7.15256f-7 8.9407f-8 7.45058f-9 4.47035f-8 -9.08971f-7 -5.96046f-8 4.76837f-7 3.57628f-7 -1.78814f-7 1.57952f-6 -1.19209f-6 4.76837f-7 5.36442f-7 8.9407f-8 -2.38419f-7 -2.98023f-8 2.98023f-7 -1.78814f-7 -4.17233f-7 -5.96046f-8 -2.98023f-7 3.12924f-7 -1.04308f-6 -2.98023f-8 -2.92063f-6 -1.49012f-7 1.72853f-6 1.57952f-6 -5.96046f-8 5.24521f-6 -3.12924f-6 1.08778f-6 1.13249f-6 -2.38419f-7 1.84774f-6 1.46218f-7 -7.15256f-7 -1.19209f-6 -2.98023f-7 -3.12924f-6 1.78814f-6 -7.7486f-7 -7.15256f-7 5.96046f-8 -6.25849f-7 5.96046f-8 8.9407f-8 4.76837f-7 3.12924f-7 1.08778f-6 -7.7486f-7 5.36442f-7 julia> L["inv(R)"] - inv(R) 10×10 Matrix{Float32}: 0.0 2.08616f-7 -5.96046f-8 3.8743f-7 2.98023f-8 -5.02914f-7 0.0 1.19209f-7 -1.78814f-7 2.68221f-7 2.08616f-7 2.38419f-7 -1.19209f-7 1.78814f-7 0.0 -1.78814f-7 -2.98023f-8 -7.45058f-8 -2.38419f-7 5.96046f-8 -5.96046f-8 -1.19209f-7 1.04308f-7 -2.38419f-7 -5.96046f-8 1.56462f-7 4.47035f-8 -1.19209f-7 2.38419f-7 -1.3411f-7 3.8743f-7 1.78814f-7 -2.38419f-7 1.78814f-7 1.49012f-7 -5.96046f-8 -1.78814f-7 2.98023f-8 -3.51109f-7 1.78814f-7 2.98023f-8 0.0 -5.96046f-8 1.49012f-7 1.19209f-7 -1.19209f-7 -6.70552f-8 1.49012f-7 -1.19209f-7 4.47035f-8 -5.02914f-7 -1.78814f-7 1.56462f-7 -5.96046f-8 -1.19209f-7 3.57628f-7 1.49012f-7 -1.49012f-8 3.27826f-7 -5.96046f-8 0.0 -2.98023f-8 4.47035f-8 -1.78814f-7 -6.70552f-8 1.49012f-7 5.96046f-8 -1.19209f-7 2.98023f-8 5.96046f-8 1.19209f-7 -7.45058f-8 -1.19209f-7 2.98023f-8 1.49012f-7 -1.49012f-8 -1.19209f-7 1.49012f-7 -1.19209f-7 5.96046f-8 -1.78814f-7 -2.38419f-7 2.38419f-7 -3.51109f-7 -1.19209f-7 3.27826f-7 2.98023f-8 -1.19209f-7 3.20375f-7 -5.96046f-8 2.68221f-7 5.96046f-8 -1.3411f-7 1.78814f-7 4.47035f-8 -5.96046f-8 5.96046f-8 5.96046f-8 -5.96046f-8 -2.38419f-7 julia> L["R"].U - R.U 10×10 Matrix{Float32}: 0.0 0.0 1.78814f-7 1.86265f-7 -2.79397f-8 1.49012f-7 0.0 0.0 0.0 0.0 0.0 0.0 5.96046f-8 -2.38419f-7 0.0 -1.19209f-7 2.98023f-8 -5.96046f-8 0.0 0.0 0.0 0.0 0.0 -4.47035f-8 5.21541f-8 0.0 3.72529f-8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.3819f-8 8.9407f-8 5.96046f-8 7.45058f-9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.9407f-8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -5.96046f-8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 julia> L["R"].D - R.D 10×10 Tridiagonal{Float32, Vector{Float32}}: -1.19209f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -2.38419f-7 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -5.96046f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -5.96046f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 7.45058f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 -2.98023f-8 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 0.0 ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.0 0.0 ```

aravindh-krishnamoorthy commented 1 week ago

Here's what I obtain:

Comparison between matrices

Thank you very much again, @jishnub. Please see below for some general points and some points on your above comparison.

OpenBLAS's Apple silicon implementations seem to favour speed over consistency of results, see https://github.com/JuliaLang/julia/issues/52539. So, tolerances for Apple silicons may need to be relaxed to take this into account.
x86 (at least a while ago) code had additional logic to ensure consistency. However, this might have changed, esp. for 32-bit floating point values.
Sometimes algorithms require higher tolerances, see https://github.com/JuliaLang/julia/pull/25418 for (X)GELSD/Y.

Regarding your comparison, I see (analysis below) that when the deltas to BunchKaufman's D and U are applied on my PC, the inversion seems to satisfy the current rtol=eps(cond(A)), indicating that the issue might lie in (X)SYTRF already. NOTE that there are no tests in test/lapack.jl presently for (X)SYTRF. Tests are only for the combination with (X)SYTRI.

Analysis of the comparison

```julia # Original A matrix from above julia> A = Float32[0.5165111 0.40158212 0.5574516 0.8424667 0.73727703 0.99013567 0.027817607 0.7097305 0.5340295 0.8029875; 0.73383516 0.21861029 0.79692626 0.31265068 0.82242167 0.96243966 0.32410073 0.043915033 0.39456594 0.27034175; 0.3094836 0.52017105 0.19609857 0.8344968 0.3720402 0.93413603 0.18523753 0.22515428 0.87819797 0.2643844; 0.1071493 0.19598752 0.45016456 0.69307053 0.26639366 0.30572063 0.052308798 0.64963084 0.070156276 0.18587488; 0.12925565 0.3273763 0.6430414 0.20246255 0.56501704 0.31385344 0.51954776 0.26923156 0.28905785 0.008479714; 0.91714334 0.056781054 0.4873765 0.7959971 0.08249408 0.61231 0.6228918 0.73107404 0.49783945 0.027323008; 0.25009543 0.2422458 0.21773565 0.25646633 0.4824924 0.7195719 0.54094857 0.9027088 0.7447457 0.5782057; 0.78010947 0.016479433 0.43123013 0.6893958 0.36214143 0.9177889 0.47928184 0.11119521 0.43025148 0.5782057; 0.74608874 0.5170252 0.9612415 0.7183566 0.41310024 0.9337084 0.25631362 0.40240157 0.45893312 0.8839705; 0.6821535 0.53460234 0.7399696 0.0029093027 0.42698807 0.22273403 0.8483786 0.7350969 0.8365097 0.8839705]; julia> A = A + transpose(A); #symmetric! # Load the JLD file as well as define the deltas (Haswell to SkylakeX). julia> using LinearAlgebra, JLD2 julia> L = load("56255-1.jld2"); julia> B = L["B"]; julia> dU = Matrix{Float32}([ 0.0 -1.78814f-7 2.98023f-7 0.0 -1.19209f-7 -2.38419f-7 -1.49012f-7 -2.98023f-7 4.47035f-8 0.0 0.0 0.0 -1.19209f-7 2.98023f-8 1.78814f-7 1.19209f-7 1.19209f-7 1.19209f-7 1.49012f-8 0.0 0.0 0.0 0.0 5.96046f-8 5.96046f-8 0.0 -1.19209f-7 -1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 -2.38419f-7 -1.19209f-7 0.0 -1.19209f-7 -5.96046f-8 0.0 0.0 0.0 0.0 0.0 0.0 7.17118f-8 -5.96046f-8 -2.98023f-8 2.98023f-8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -1.19209f-7 -1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.19209f-7 1.19209f-7 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0]); julia> dD = Matrix{Float32}([ 3.57628f-7 0.0 0 0 0 0 0 0 0 0 0.0 -4.76837f-7 0.0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0.0 9.53674f-7 0.0 0 0 0 0 0 0 0 0 0.0 -5.96046f-8 0.0 0 0 0 0 0 0 0 0 0.0 4.76837f-7 0.0 0 0 0 0 0 0 0 0 0.0 0.0 0.0 0 0 0 0 0 0 0 0 0.0 5.96046f-8 0.0 0 0 0 0 0 0 0 0 0.0 -5.96046f-8 0.0 0 0 0 0 0 0 0 0 0.0 0.0]); # The original algorithm fails the current rtol julia> isapprox(inv(B), inv(A); rtol=eps(cond(A))) false julia> norm(inv(B)-inv(A)) 8.977805f-6 # However, applying the diagonal and U deltas passes the test! julia> B1 = BunchKaufman(B.LD + dD + dU, B.ipiv, 'U', true, false, 0); julia> isapprox(inv(B1), inv(A); rtol=eps(cond(A))) true julia> norm(inv(B1)-inv(A)) 4.343118f-6 ```

Outcome:

Unfortunately, I could not yet decide if I should look further at lapack.jl and SSYTRF to isolate the problem or recommend increasing the rtol value analogous to (X)GELSD/Y.
Given that isapprox(B.P'*inv(B.U')*inv(B.D)*inv(B.U)*B.P, inv(A); rtol=eps(cond(A))) is true, the tolerance is likely too tight.
Computing an rtol bound for Bunch-Kaufman decomposition-based inversion and utilizing it to revise the current rtol is perhaps the best way to go.
I might have to think a bit further on this.

aravindh-krishnamoorthy commented 3 hours ago

Extending the analysis in [1] given below for inversion, it seems that the required rtol can be significantly higher than eps(cond(A)).

The following script (for Gaussian matrices) seems to agree with this.

function worstN(N)
R = zeros(N,1)
E = zeros(N,1)
B = zeros(Bool,N,1)
for i in 1:N
    A = (X->(X+X')/2)(Float32.(randn(4,4)))
    A1 = inv(A)
    A2 = inv(bunchkaufman(A))
    R[i] = norm(A1-A2)/max(norm(A1), norm(A2))
    E[i] = eps(cond(A))
    B[i] = isapprox(A1, A2; rtol=eps(cond(A)))
end
maximum(R./E)
end

for i = 1:100 display(worstN(1000000)); end

Note: Also applies to Float64.


[1]	N. J. Higham, "Stability of the Diagonal Pivoting Method with Partial Pivoting," Report no. 265, https://www.netlib.org/lapack/lawnspdf/lawn105.pdf

JuliaLang / julia

Test failure in `LinearAlgebra/lapack.jl` in comparing `LAPACK.sytri!` to `inv` for an almost-symmetric matrix #56255