Closed staticfloat closed 1 year ago
To test this branch out on your local Julia, run the following script:
JULIA_PREFIX=/path/to/julia
# Back up normal LBT
for f in ${JULIA_PREFIX}/lib/julia/libblastrampoline*; do
cp -v ${f} ${f}.backup
done
# Build new LBT
cd src
make
# Install new LBT into Julia's lib/julia directory
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.5.dylib
cp build/libblastrampoline.5.dylib ${JULIA_PREFIX}/lib/julia/libblastrampoline.dylib
Then, when you launch Julia, load Accelerate via:
using LinearAlgebra
# Load LP64 interface first
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK", verbose=true, clear=true)
# Load ILP64 interface next
BLAS.lbt_forward("/System/Library/Frameworks/Accelerate.framework/Accelerate"; suffix_hint="\x1a\$NEWLAPACK\$ILP64", verbose=true)
# Ensure that an ILP64 interface was actually loaded:
config = BLAS.get_config()
if !any(lib.interface == :ilp64 for lib in config.loaded_libs)
@error("No ILP64 interfaces found; are you sure you're on macOS 13.3 or higher?!")
end
# Show LBT config:
display(BLAS.get_config())
Verified that this works for me, and for peakflops()
gives twice the flop rate.
Can someone try running the LinearAlgebra tests with Accelerate loaded?
Results of the LinearAlgebra tests. Looks pretty good. The get_num_threads
needs adjusting for Accelerate, and very few tests are failing.
➜ julia git:(vs/accelerate64) ✗ ./julia test/runtests.jl LinearAlgebra
Running parallel tests with:
nworkers() = 8
nthreads() = 1
Sys.CPU_THREADS = 8
Sys.total_memory() = 32.000 GiB
Sys.free_memory() = 12.930 GiB
Test (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/matmul (4) | started at 2023-04-01T23:41:15.776
LinearAlgebra/diagonal (7) | started at 2023-04-01T23:41:15.820
LinearAlgebra/addmul (2) | started at 2023-04-01T23:41:15.861
LinearAlgebra/special (8) | started at 2023-04-01T23:41:15.861
LinearAlgebra/dense (5) | started at 2023-04-01T23:41:15.862
LinearAlgebra/triangular (3) | started at 2023-04-01T23:41:15.862
LinearAlgebra/symmetric (6) | started at 2023-04-01T23:41:15.862
LinearAlgebra/bidiag (9) | started at 2023-04-01T23:41:15.862
LinearAlgebra/special (8) | 83.84 | 2.54 | 3.0 | 13877.34 | 1423.22
LinearAlgebra/qr (8) | started at 2023-04-01T23:42:39.839
LinearAlgebra/bidiag (9) | 90.00 | 2.85 | 3.2 | 13317.18 | 1635.91
LinearAlgebra/cholesky (9) | started at 2023-04-01T23:42:45.923
LinearAlgebra/dense (5) | 115.04 | 4.13 | 3.6 | 17711.68 | 1667.75
LinearAlgebra/blas (5) | started at 2023-04-01T23:43:10.965
LinearAlgebra/diagonal (7) | 119.24 | 4.46 | 3.7 | 18937.22 | 1695.06
LinearAlgebra/lu (7) | started at 2023-04-01T23:43:15.170
LinearAlgebra/qr (8) | 42.00 | 1.19 | 2.8 | 6571.17 | 2135.47
LinearAlgebra/uniformscaling (8) | started at 2023-04-01T23:43:21.846
LinearAlgebra/cholesky (9) | failed at 2023-04-01T23:43:26.173
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
LinearAlgebra/structuredbroadcast (10) | started at 2023-04-01T23:43:28.313
LinearAlgebra/blas (5) | failed at 2023-04-01T23:43:30.921
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:678
Expression: default > 0
Evaluated: 0 > 0
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:680
Expression: BLAS.get_num_threads() === 1
Evaluated: 0 === 1
LinearAlgebra/symmetric (6) | 136.01 | 3.62 | 2.7 | 19231.91 | 1740.31
LinearAlgebra/hessenberg (6) | started at 2023-04-01T23:43:31.953
LinearAlgebra/svd (11) | started at 2023-04-01T23:43:32.994
LinearAlgebra/matmul (4) | 140.62 | 4.53 | 3.2 | 21742.65 | 1389.61
LinearAlgebra/eigen (4) | started at 2023-04-01T23:43:36.533
LinearAlgebra/structuredbroadcast (10) | 17.92 | 0.77 | 4.3 | 3234.67 | 613.28
LinearAlgebra/tridiag (10) | started at 2023-04-01T23:43:46.360
LinearAlgebra/uniformscaling (8) | 25.05 | 1.04 | 4.2 | 3629.15 | 2428.69
LinearAlgebra/lapack (8) | started at 2023-04-01T23:43:46.899
LinearAlgebra/hessenberg (6) | 16.74 | 0.53 | 3.2 | 2489.17 | 1923.83
LinearAlgebra/lq (6) | started at 2023-04-01T23:43:48.694
LinearAlgebra/lu (7) | 41.35 | 1.17 | 2.8 | 6116.89 | 2578.94
LinearAlgebra/adjtrans (7) | started at 2023-04-01T23:43:56.532
LinearAlgebra/lapack (8) | 13.14 | 0.34 | 2.6 | 1572.33 | 2486.33
LinearAlgebra/generic (8) | started at 2023-04-01T23:44:00.046
LinearAlgebra/lq (6) | 16.15 | 0.49 | 3.0 | 2499.47 | 2104.75
LinearAlgebra/schur (6) | started at 2023-04-01T23:44:04.843
LinearAlgebra/eigen (4) | 28.47 | 1.04 | 3.7 | 4258.50 | 1654.45
LinearAlgebra/bunchkaufman (4) | started at 2023-04-01T23:44:05.008
LinearAlgebra/svd (11) | 32.77 | 1.21 | 3.7 | 5443.56 | 807.22
LinearAlgebra/givens (11) | started at 2023-04-01T23:44:05.893
LinearAlgebra/adjtrans (7) | 11.40 | 0.70 | 6.1 | 2016.61 | 2931.66
LinearAlgebra/pinv (7) | started at 2023-04-01T23:44:07.945
LinearAlgebra/givens (11) | 3.25 | 0.15 | 4.5 | 480.30 | 1011.66
LinearAlgebra/factorization (11) | started at 2023-04-01T23:44:09.141
LinearAlgebra/factorization (11) | 1.58 | 0.03 | 1.9 | 250.17 | 1033.64
LinearAlgebra/abstractq (11) | started at 2023-04-01T23:44:10.736
LinearAlgebra/pinv (7) | 3.67 | 0.24 | 6.4 | 817.49 | 3158.14
LinearAlgebra/ldlt (7) | started at 2023-04-01T23:44:11.615
LinearAlgebra/ldlt (7) | 0.31 | 0.00 | 0.0 | 47.32 | 3160.14
LinearAlgebra/abstractq (11) | failed at 2023-04-01T23:44:12.844
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
Evaluated: [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.5195924311372611; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740214]
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
Evaluated: ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.0666131716762939im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.3984997499093669im]
LinearAlgebra/tridiag (10) | 28.33 | 0.88 | 3.1 | 4643.44 | 1031.50
LinearAlgebra/generic (8) | 17.30 | 0.60 | 3.5 | 2544.61 | 2719.42
LinearAlgebra/bunchkaufman (4) | 12.82 | 0.54 | 4.2 | 1991.96 | 1759.48
LinearAlgebra/addmul (2) | 191.61 | 4.16 | 2.2 | 22170.09 | 1365.72
LinearAlgebra/triangular (3) | 220.06 | 7.61 | 3.5 | 33463.90 | 3482.58
LinearAlgebra/schur (6) | 65.27 | 0.25 | 0.4 | 1307.90 | 2300.78
Test Summary: | Pass Fail Broken Total Time
Overall | 97516 9 17 97542 3m55.4s
LinearAlgebra/special | 3675 3675 1m24.7s
LinearAlgebra/bidiag | 4828 4828 1m30.8s
LinearAlgebra/dense | 8561 8561 1m55.9s
LinearAlgebra/diagonal | 2945 2945 2m00.1s
LinearAlgebra/qr | 4782 4782 42.0s
LinearAlgebra/cholesky | 2506 5 2511 40.3s
LinearAlgebra/blas | 1628 2 1630 20.0s
LinearAlgebra/symmetric | 3000 3000 2m16.8s
LinearAlgebra/matmul | 1496 1496 2m21.4s
LinearAlgebra/structuredbroadcast | 672 672 18.4s
LinearAlgebra/uniformscaling | 446 446 25.0s
LinearAlgebra/hessenberg | 667 667 16.7s
LinearAlgebra/lu | 1380 1380 41.4s
LinearAlgebra/lapack | 818 818 13.2s
LinearAlgebra/lq | 2898 2898 16.1s
LinearAlgebra/eigen | 951 951 28.5s
LinearAlgebra/svd | 606 606 33.2s
LinearAlgebra/adjtrans | 479 479 11.4s
LinearAlgebra/givens | 1991 1991 3.3s
LinearAlgebra/factorization | 85 16 101 1.6s
LinearAlgebra/pinv | 500 500 3.7s
LinearAlgebra/ldlt | 8 8 0.3s
LinearAlgebra/abstractq | 104 2 106 2.1s
LinearAlgebra/tridiag | 1621 1621 28.4s
LinearAlgebra/generic | 720 1 721 17.3s
LinearAlgebra/bunchkaufman | 5689 5689 12.8s
LinearAlgebra/addmul | 5688 5688 3m12.5s
LinearAlgebra/triangular | 38272 38272 3m40.9s
LinearAlgebra/schur | 500 500 1m05.3s
FAILURE
The global RNG seed was 0x2d0c35f148fbf575674e2e79e883fe44.
Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Error in testset LinearAlgebra/cholesky:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/cholesky.jl:149
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
Error in testset LinearAlgebra/blas:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:678
Expression: default > 0
Evaluated: 0 > 0
Error in testset LinearAlgebra/blas:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/blas.jl:680
Expression: BLAS.get_num_threads() === 1
Evaluated: 0 === 1
Error in testset LinearAlgebra/abstractq:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
Evaluated: [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.519592431137261; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740215] == [-0.35953221510956435 0.1965454404197131 0.5195924311372611; -0.22582192557737624 -0.6725058409923694 0.6076020971223726; … ; -0.09448452509982379 -0.655284066450569 -0.5071271232591662; -0.5324407150637704 0.28215697975293297 -0.10211308565740214]
Error in testset LinearAlgebra/abstractq:
Test Failed at /Users/viral/julia/usr/share/julia/stdlib/v1.10/LinearAlgebra/test/abstractq.jl:59
Expression: Q[:, 1:3] == Q.Q[:, 1:3] == (Matrix(Q))[:, 1:3]
Evaluated: ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.06661317167629388im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.398499749909367im] == ComplexF64[-0.49837132037257037 - 0.4782120069360776im -0.150972588569905 - 0.28336417274098175im 0.11710459089697636 + 0.24599737894264181im; -0.5297065808419519 - 0.2180434544860795im -0.258256476833034 + 0.09563123480722824im -0.269175148885196 - 0.16302309774541907im; … ; -0.27303350965655987 - 0.2065902594255449im 0.27335708885184995 + 0.3499871631428588im 0.4250421451010471 - 0.0666131716762939im; -0.05787258293172755 - 0.2426695859781627im 0.5018747549963603 + 0.4058519997055557im -0.016960609747618538 - 0.3984997499093669im]
ERROR: LoadError: Test run finished with errors
in expression starting at /Users/viral/julia/test/runtests.jl:93
In the QR tests, it is a difference on the order of eps.
julia> Q[:,1:3] - Matrix(Q)[:,1:3]
5×3 Matrix{Float64}:
0.0 0.0 -5.55112e-17
0.0 0.0 -1.38778e-17
0.0 0.0 0.0
0.0 0.0 0.0
0.0 0.0 0.0
The pivoted cholesky failure is reproduced as follows:
julia> using LinearAlgebra, Test
julia> apdh = Float32[3.4291134 -0.61112815 0.8155207 -0.9183448 -0.61426914 1.021518 -0.07258626 -1.9560734 -1.6641214 0.4579194; -0.61112815 3.1250076 -0.44400603 0.97749346 -0.20464422 0.4505959 -1.1918031 -1.0662653 -0.5070573 -0.06222758; 0.8155207 -0.44400603 0.5413656 0.53000593 -0.4061885 0.028969476 -0.45150727 -0.00913016 -0.61976415 -0.17616473; -0.9183448 0.97749346 0.53000593 5.108155 -0.44426316 -0.5661762 -1.5497217 0.012556124 -1.0134082 0.52672774; -0.61426914 -0.20464422 -0.4061885 -0.44426316 2.1013746 -0.0008714596 1.9995844 -0.80452967 0.81173474 1.0536224; 1.021518 0.4505959 0.028969476 -0.5661762 -0.0008714596 2.52233 0.46488014 -0.90563935 -0.44942248 -0.03283302; -0.07258626 -1.1918031 -0.45150727 -1.5497217 1.9995844 0.46488014 3.378583 -0.98002505 0.58866936 0.6275643; -1.9560734 -1.0662653 -0.00913016 0.012556124 -0.80452967 -0.90563935 -0.98002505 3.117008 1.2102712 -1.4144715; -1.6641214 -0.5070573 -0.61976415 -1.0134082 0.81173474 -0.44942248 0.58866936 1.2102712 1.9170706 0.44138947; 0.4579194 -0.06222758 -0.17616473 0.52672774 1.0536224 -0.03283302 0.6275643 -1.4144715 0.44138947 2.4111974]
10×10 Matrix{Float32}:
3.42911 -0.611128 0.815521 -0.918345 -0.614269 1.02152 -0.0725863 -1.95607 -1.66412 0.457919
-0.611128 3.12501 -0.444006 0.977493 -0.204644 0.450596 -1.1918 -1.06627 -0.507057 -0.0622276
0.815521 -0.444006 0.541366 0.530006 -0.406188 0.0289695 -0.451507 -0.00913016 -0.619764 -0.176165
-0.918345 0.977493 0.530006 5.10815 -0.444263 -0.566176 -1.54972 0.0125561 -1.01341 0.526728
-0.614269 -0.204644 -0.406188 -0.444263 2.10137 -0.00087146 1.99958 -0.80453 0.811735 1.05362
1.02152 0.450596 0.0289695 -0.566176 -0.00087146 2.52233 0.46488 -0.905639 -0.449422 -0.032833
-0.0725863 -1.1918 -0.451507 -1.54972 1.99958 0.46488 3.37858 -0.980025 0.588669 0.627564
-1.95607 -1.06627 -0.00913016 0.0125561 -0.80453 -0.905639 -0.980025 3.11701 1.21027 -1.41447
-1.66412 -0.507057 -0.619764 -1.01341 0.811735 -0.449422 0.588669 1.21027 1.91707 0.441389
0.457919 -0.0622276 -0.176165 0.526728 1.05362 -0.032833 0.627564 -1.41447 0.441389 2.4112
julia> cpapd = cholesky(apdh, RowMaximum())
CholeskyPivoted{Float32, Matrix{Float32}, Vector{Int64}}
U factor with rank 10:
10×10 UpperTriangular{Float32, Matrix{Float32}}:
2.26012 -0.406325 0.432496 -0.68568 -0.250507 -0.196566 -0.448386 0.0055555 0.234503 0.233053
⋅ 1.80666 -0.240994 -0.19439 0.509079 -0.384212 -1.02195 -1.08145 0.504138 0.305877
⋅ ⋅ 1.69702 -0.555147 0.401659 -0.125056 -0.329646 -0.78331 -0.24981 -0.052626
⋅ ⋅ ⋅ 1.60077 0.384224 1.07492 -0.0627441 -1.01282 -0.207023 0.510761
⋅ ⋅ ⋅ ⋅ 1.3753 -0.158 0.0836318 0.254541 0.00796201 -0.22197
⋅ ⋅ ⋅ ⋅ ⋅ 0.847978 0.436789 -0.221684 0.0308417 0.738552
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.601048 -0.230679 -0.181164 0.946928
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.375373 0.24624 -0.453482
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.180238 -0.167688
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.523937
permutation:
10-element Vector{Int64}:
4
1
2
7
6
5
9
8
3
10
julia> @test all(diff(diag(real(cpapd.factors))).<=0.) # diagonal should be non-increasing
Test Failed at REPL[110]:1
Expression: all(diff(diag(real(cpapd.factors))) .<= 0.0)
ERROR: There was an error during testing
Any ideas? @amontoison @dkarrasch
@staticfloat I believe it is fine to merge this PR.
Any ideas? @amontoison @dkarrasch
I highly suspect a bug in the new BLAS / LAPACK routines of Apple Accelerate. I have a different result with OpenBLAS and Intel MKL.
julia> cpapd = cholesky(apdh, RowMaximum())
CholeskyPivoted{Float32, Matrix{Float32}, Vector{Int64}}
U factor with rank 10:
10×10 UpperTriangular{Float32, Matrix{Float32}}:
2.26012 -0.406325 0.432496 -0.68568 0.233053 -0.250507 -0.196566 -0.448386 0.0055555 0.234503
⋅ 1.80666 -0.240994 -0.19439 0.305877 0.509079 -0.384212 -1.02195 -1.08145 0.504138
⋅ ⋅ 1.69702 -0.555147 -0.052626 0.401659 -0.125056 -0.329646 -0.78331 -0.24981
⋅ ⋅ ⋅ 1.60077 0.510761 0.384224 1.07492 -0.0627441 -1.01282 -0.207023
⋅ ⋅ ⋅ ⋅ 1.4141 -0.21588 0.467681 0.617479 -0.430583 -0.206794
⋅ ⋅ ⋅ ⋅ ⋅ 1.35825 -0.0856505 0.182823 0.1893 -0.0248058
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.7197 0.116784 -0.014745 0.166019
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.361517 0.0519991 0.0500296
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.289647 0.155059
⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ 0.17166
permutation:
10-element Vector{Int64}:
4
1
2
7
10
6
5
9
8
3
@amontoison That is what I observed too with openblas, and I was just wondering if we are doing something non-standard or making assumptions in the way we call those routines right now. Great idea to check with MKL, because that suggests that it really is a bug.
The one thing I want to do before merging this PR is to provide sane fallback behavior for the set/get threads API when the underlying library has no concept of threads. My plan is to make set threads do nothing, and get threads always return 1
.
Does someone here know what LAPACK call is the culprit for the wrong pivoted cholesky results? Is it xPSTRF
?
Yes, it seems that it's pstrf.
Do the Intel macs also have the updated ILP64 BLAS and LAPACK?
Yes. @staticfloat 's instructions worked fine on my 1999 Intel Mac.
I'm reporting the dpstrf
bug to Apple filed as FB 12098971, easy reproducer available here: https://gist.github.com/staticfloat/494272e86b44a3e2d31de898473b3c64
I suspect that the old LAPACK release that Apple used to ship (LAPACK 3.1 IIRC) didn't even have dpstrf
. So may be ok to not worry about it.
I assume the plan for now should be for LBT to support all the new stuff, but Julia to default to openblas, with users opting into using the new capabilities through AppleAccelerate.jl (like we do MKL.jl).
I suspect that the old LAPACK release that Apple used to ship (LAPACK 3.1 IIRC) didn't even have dpstrf. So may be ok to not worry about it.
The old LAPACK release did have dsptrf_
, and it calculated it correctly. So it appears to be a regression in the new LAPACK.
I assume the plan for now should be for LBT to support all the new stuff, but Julia to default to openblas, with users opting into using the new capabilities through AppleAccelerate.jl (like we do MKL.jl).
Yes. I think AppleAccelerate.jl
should incorporate some LBT invocations similar to what I posted in this thread, so that it can load the ILP64 and new LP64-LAPACK bindings instead of the old ones. But that can only happen after this new release is cut.
Will we have the release 5.6.0 of LBT with Julia 1.9?
We should open the PR on Julia master and mark it for backporting. Even if not 1.9, I think we can have it in 1.9.1.
Apple got back to me; they have identified the issue and are trying to get a fix into a future macOS version.
We should open the PR on Julia master and mark it for backporting. Even if not 1.9, I think we can have it in 1.9.1.
Thanks @ViralBShah! I'm working with the HSL team to release of JuliaHSL package with a precompiled HSL_jll
inside (we can't precompile it with Yggdrasil because of the academic license).
It convinces us to precompile HSL_jll
with LBT for Julia ≥ 1.9.
Wouldn't the licensing issue be the same if inside HSL.jl vs BB? Perhaps the right thing is for them to serve the JLL through a separate registry.
The new Accelerate released in macOS v13.3 provides two new interfaces; an upgraded LAPACK for the LP64 interface, and an all-new ILP64 interface (that uses the same upgraded LAPACK). These symbols are available from Accelerate with the suffix
$NEWLAPACK
and$NEWLAPACK$ILP64
, respectively.Unfortunately, this is not a "true" suffix, as Apple has decided to drop the trailing underscore from the typical F77 names, meaning that a symbol such as
dgemm_
gets mangled todgemm$NEWLAPACK
, whereas a CBLAS symbol such ascblas_zdotc_sub
gets mangled tocblas_zdotc_sub$NEWLAPACK
. This means that we need to selectively erase the trailing underscore from some symbols when applying this Accelerate suffix.To do this, we add a new feature, enabled by default only on Apple builds, called
SYMBOL_TRIMMING
, which allows asuffix_hint
to contain the ASCII "substitution character"0x1a
as the first character of the suffix hint to mean "remove a trailing underscore when applying this suffix".To make dealing with suffix hints easier for command-line users, these suffix hints are available for use in
LBT_BACKING_LIBS
by listing libraries separated by suffix hints with an exclamation point, e.g.libname!suffix
.