JuliaLinearAlgebra / AppleAccelerate.jl

Julia interface to the macOS Accelerate framework
Other
96 stars 18 forks source link

Use LBT to forward BLAS and LAPACK calls to Accelerate #58

Closed staticfloat closed 1 year ago

staticfloat commented 1 year ago

This throws away most of the previous version, instead opting to re-architect this package to make use of LBT to transparently use Accelerate for BLAS and LAPACK operations. Further enhancements to re-introduce the DSP functionality can be made, potentially in a separate package if we want to keep this one lightweight, as it may end up at the bottom of many dependency trees.

This re-architecting causes Accelerate to pass the full LinearAlgebra test suite (thanks to the usage of an external LAPACK_jll to paper over bugs in dsptrf(); hopefully no longer necessary in a future macOS update).

Fixes #45

staticfloat commented 1 year ago

This will naturally fail CI on any macOS older than 13.3

Anecdotally, Accelerate on my M1 Pro runs the LinearAlgebra test suite pretty quickly:

Running parallel tests with:
  nworkers() = 8
  nthreads() = 1
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 16.000 GiB
  Sys.free_memory() = 1.479 GiB

Test                          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/bidiag               (9) |        started at 2023-04-12T12:13:15.792
LinearAlgebra/diagonal             (7) |        started at 2023-04-12T12:13:15.839
LinearAlgebra/special              (8) |        started at 2023-04-12T12:13:15.883
LinearAlgebra/symmetric            (6) |        started at 2023-04-12T12:13:15.883
LinearAlgebra/triangular           (3) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/addmul               (2) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/matmul               (4) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/dense                (5) |        started at 2023-04-12T12:13:15.884
LinearAlgebra/special              (8) |    97.45 |   2.87 |  2.9 |   13718.53 |   875.08
LinearAlgebra/qr                   (8) |        started at 2023-04-12T12:14:53.531
LinearAlgebra/bidiag               (9) |   106.62 |   3.55 |  3.3 |   13216.62 |  1039.58
LinearAlgebra/cholesky             (9) |        started at 2023-04-12T12:15:02.584
LinearAlgebra/dense                (5) |   142.17 |   5.91 |  4.2 |   17388.80 |  1441.88
LinearAlgebra/blas                 (5) |        started at 2023-04-12T12:15:38.173
LinearAlgebra/diagonal             (7) |   147.45 |   6.44 |  4.4 |   18475.31 |  1209.39
LinearAlgebra/lu                   (7) |        started at 2023-04-12T12:15:43.451
LinearAlgebra/qr                   (8) |    54.92 |   2.56 |  4.7 |    6552.89 |   945.16
LinearAlgebra/uniformscaling       (8) |        started at 2023-04-12T12:15:48.463
LinearAlgebra/cholesky             (9) |    54.48 |   3.47 |  6.4 |    5250.13 |  1039.58
LinearAlgebra/structuredbroadcast  (9) |        started at 2023-04-12T12:15:57.069
LinearAlgebra/addmul               (2) |   163.83 |   4.79 |  2.9 |   15629.88 |   625.73
LinearAlgebra/hessenberg           (2) |        started at 2023-04-12T12:15:59.829
LinearAlgebra/symmetric            (6) |   169.04 |   6.57 |  3.9 |   18915.86 |  1102.09
LinearAlgebra/svd                  (6) |        started at 2023-04-12T12:16:05.030
LinearAlgebra/matmul               (4) |   175.18 |   6.65 |  3.8 |   21358.36 |   831.70
LinearAlgebra/eigen                (4) |        started at 2023-04-12T12:16:11.197
LinearAlgebra/blas                 (5) |    33.83 |   2.18 |  6.4 |    2384.33 |  1441.88
LinearAlgebra/tridiag              (5) |        started at 2023-04-12T12:16:12.010
LinearAlgebra/structuredbroadcast  (9) |    31.07 |   3.32 | 10.7 |    2900.56 |  1039.58
LinearAlgebra/lapack               (9) |        started at 2023-04-12T12:16:28.183
LinearAlgebra/uniformscaling       (8) |    47.64 |   3.37 |  7.1 |    3557.69 |  1105.19
LinearAlgebra/lq                   (8) |        started at 2023-04-12T12:16:36.137
LinearAlgebra/hessenberg           (2) |    47.76 |   3.17 |  6.6 |    3818.54 |   712.42
LinearAlgebra/adjtrans             (2) |        started at 2023-04-12T12:16:47.611
LinearAlgebra/svd                  (6) |    44.37 |   5.12 | 11.5 |    3351.13 |  1102.09
LinearAlgebra/generic              (6) |        started at 2023-04-12T12:16:49.421
LinearAlgebra/lapack               (9) |    28.29 |   2.55 |  9.0 |    1628.27 |  1039.58
LinearAlgebra/schur                (9) |        started at 2023-04-12T12:16:56.510
LinearAlgebra/tridiag              (5) |    47.93 |   4.71 |  9.8 |    2726.56 |  1441.88
LinearAlgebra/bunchkaufman         (5) |        started at 2023-04-12T12:16:59.965
LinearAlgebra/lq                   (8) |    33.69 |   3.54 | 10.5 |    1793.97 |  1105.19
LinearAlgebra/givens               (8) |        started at 2023-04-12T12:17:09.844
LinearAlgebra/lu                   (7) |    94.47 |  10.87 | 11.5 |    5976.82 |  1209.39
LinearAlgebra/pinv                 (7) |        started at 2023-04-12T12:17:17.950
LinearAlgebra/adjtrans             (2) |    31.05 |   3.38 | 10.9 |    2257.61 |   728.20
LinearAlgebra/factorization        (2) |        started at 2023-04-12T12:17:18.677
LinearAlgebra/eigen                (4) |    68.53 |   7.26 | 10.6 |    4228.16 |   831.70
LinearAlgebra/abstractq            (4) |        started at 2023-04-12T12:17:19.739
LinearAlgebra/givens               (8) |    10.21 |   1.82 | 17.8 |     397.91 |  1105.19
LinearAlgebra/ldlt                 (8) |        started at 2023-04-12T12:17:20.074
LinearAlgebra/ldlt                 (8) |     1.06 |   0.00 |  0.0 |      61.72 |  1105.19
LinearAlgebra/factorization        (2) |     4.06 |   0.49 | 12.0 |     304.59 |   815.39
LinearAlgebra/abstractq            (4) |     3.86 |   0.24 |  6.2 |     331.80 |   913.75
LinearAlgebra/bunchkaufman         (5) |    23.79 |   2.78 | 11.7 |    1370.65 |  1441.88
LinearAlgebra/pinv                 (7) |     6.97 |   0.75 | 10.7 |     855.20 |  1428.39
LinearAlgebra/generic              (6) |    38.02 |   3.72 |  9.8 |    2491.66 |  1226.55
LinearAlgebra/schur                (9) |    84.99 |   1.83 |  2.2 |    1404.24 |  1039.58
LinearAlgebra/triangular           (3) |   306.81 |  18.81 |  6.1 |   33163.34 |  2196.91

Test Summary: |  Pass  Broken  Total     Time
  Overall     | 96483      17  96500  5m08.2s
    SUCCESS
Test Summary:                 |   Time
Full LinearAlgebra test suite | None  5m12.7s
     Testing AppleAccelerate tests passed

Versus OpenBLAS:

Running parallel tests with:
  nworkers() = 8
  nthreads() = 1
  Sys.CPU_THREADS = 8
  Sys.total_memory() = 16.000 GiB
  Sys.free_memory() = 2.436 GiB

Test                          (Worker) | Time (s) | GC (s) | GC % | Alloc (MB) | RSS (MB)
LinearAlgebra/diagonal             (7) |        started at 2023-04-12T12:19:43.415
LinearAlgebra/special              (8) |        started at 2023-04-12T12:19:43.479
LinearAlgebra/matmul               (4) |        started at 2023-04-12T12:19:43.582
LinearAlgebra/addmul               (2) |        started at 2023-04-12T12:19:43.582
LinearAlgebra/triangular           (3) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/symmetric            (6) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/dense                (5) |        started at 2023-04-12T12:19:43.583
LinearAlgebra/bidiag               (9) |        started at 2023-04-12T12:19:43.586
LinearAlgebra/special              (8) |   101.46 |   3.33 |  3.3 |   13718.61 |   917.61
LinearAlgebra/qr                   (8) |        started at 2023-04-12T12:21:25.276
LinearAlgebra/bidiag               (9) |   114.20 |   3.99 |  3.5 |   13216.68 |   956.81
LinearAlgebra/cholesky             (9) |        started at 2023-04-12T12:21:37.868
LinearAlgebra/dense                (5) |   145.96 |   5.71 |  3.9 |   17388.89 |  1106.66
LinearAlgebra/blas                 (5) |        started at 2023-04-12T12:22:09.665
LinearAlgebra/diagonal             (7) |   158.96 |   7.93 |  5.0 |   18475.02 |  1177.16
LinearAlgebra/lu                   (7) |        started at 2023-04-12T12:22:22.653
LinearAlgebra/qr                   (8) |    57.88 |   3.25 |  5.6 |    6552.97 |   990.30
LinearAlgebra/uniformscaling       (8) |        started at 2023-04-12T12:22:23.163
LinearAlgebra/cholesky             (9) |    56.94 |   3.20 |  5.6 |    5249.92 |   979.05
LinearAlgebra/structuredbroadcast  (9) |        started at 2023-04-12T12:22:34.828
LinearAlgebra/symmetric            (6) |   175.63 |   7.43 |  4.2 |   18916.03 |  1136.31
LinearAlgebra/hessenberg           (6) |        started at 2023-04-12T12:22:39.359
LinearAlgebra/blas                 (5) |    34.75 |   2.57 |  7.4 |    2384.31 |  1226.16
LinearAlgebra/svd                  (5) |        started at 2023-04-12T12:22:44.459
LinearAlgebra/matmul               (4) |   183.24 |   7.50 |  4.1 |   21417.15 |   749.80
LinearAlgebra/eigen                (4) |        started at 2023-04-12T12:22:46.928
LinearAlgebra/structuredbroadcast  (9) |    33.27 |   3.61 | 10.9 |    2900.77 |   979.05
LinearAlgebra/tridiag              (9) |        started at 2023-04-12T12:23:08.125
LinearAlgebra/hessenberg           (6) |    32.13 |   2.63 |  8.2 |    2461.15 |  1136.31
LinearAlgebra/lapack               (6) |        started at 2023-04-12T12:23:11.506
LinearAlgebra/uniformscaling       (8) |    48.48 |   3.70 |  7.6 |    3557.68 |  1007.66
LinearAlgebra/lq                   (8) |        started at 2023-04-12T12:23:11.649
LinearAlgebra/svd                  (5) |    44.31 |   3.41 |  7.7 |    2903.87 |  1226.16
LinearAlgebra/adjtrans             (5) |        started at 2023-04-12T12:23:28.782
LinearAlgebra/lapack               (6) |    24.98 |   2.24 |  9.0 |    1414.08 |  1136.31
LinearAlgebra/generic              (6) |        started at 2023-04-12T12:23:36.518
LinearAlgebra/lq                   (8) |    30.77 |   3.02 |  9.8 |    1794.01 |  1007.66
LinearAlgebra/schur                (8) |        started at 2023-04-12T12:23:42.448
LinearAlgebra/tridiag              (9) |    40.00 |   3.86 |  9.7 |    2215.41 |   979.05
LinearAlgebra/bunchkaufman         (9) |        started at 2023-04-12T12:23:48.155
LinearAlgebra/eigen                (4) |    63.84 |   6.05 |  9.5 |    4228.25 |   749.80
LinearAlgebra/givens               (4) |        started at 2023-04-12T12:23:50.788
LinearAlgebra/lu                   (7) |    96.01 |  10.01 | 10.4 |    5976.80 |  1177.16
LinearAlgebra/pinv                 (7) |        started at 2023-04-12T12:23:58.674
LinearAlgebra/givens               (4) |     8.93 |   0.79 |  8.9 |     498.21 |   749.80
LinearAlgebra/factorization        (4) |        started at 2023-04-12T12:23:59.746
LinearAlgebra/adjtrans             (5) |    32.41 |   3.09 |  9.5 |    1977.57 |  1226.16
LinearAlgebra/abstractq            (5) |        started at 2023-04-12T12:24:01.227
LinearAlgebra/factorization        (4) |     4.25 |   0.65 | 15.3 |     223.63 |   749.80
LinearAlgebra/ldlt                 (4) |        started at 2023-04-12T12:24:04.039
LinearAlgebra/ldlt                 (4) |     1.40 |   0.00 |  0.0 |      70.48 |   749.80
LinearAlgebra/abstractq            (5) |     6.95 |   2.07 | 29.8 |     283.06 |  1226.16
LinearAlgebra/pinv                 (7) |    10.68 |   2.38 | 22.3 |     855.16 |  1411.39
LinearAlgebra/generic              (6) |    37.83 |   4.00 | 10.6 |    2510.75 |  1272.36
LinearAlgebra/bunchkaufman         (9) |    30.87 |   2.65 |  8.6 |    2729.90 |  1360.27
LinearAlgebra/triangular           (3) |   326.29 |  21.67 |  6.6 |   33163.40 |  2458.39
LinearAlgebra/schur                (8) |    88.67 |   2.56 |  2.9 |    1484.38 |  1007.66
LinearAlgebra/addmul               (2) |   420.11 |  13.89 |  3.3 |   37199.14 |  1532.12

Test Summary: |   Pass  Broken   Total     Time
  Overall     | 106833      17  106850  7m01.8s
    SUCCESS

Although I do see that we run slightly more tests on OpenBLAS; not sure why that is.

staticfloat commented 1 year ago

As an update, macOS v13.4 beta 3 fixes the dsptrf bug; running the LinearAlgebra test suite with only Accelerate loaded (no external LAPACK) passes!

ViralBShah commented 1 year ago

Wow that's quick. I suppose in that case the simplest thing is to make macOS 13.4 the min version and then remove all the LAPACK overlay stuff.

Moblin88 commented 1 year ago

I am trying to run the ILP64 accelerate branch on MacOS 13.3.1 (on an M2 chip). I get an error when LBT tries to load lapack from the LAPACK_jll artifact. The error I get is:

Unable to autodetect interface type of "/Users/nicholasengelking/.julia/artifacts/65c65bc8413bbca96d1d988b65cdae3d9a64cedb/lib/liblapack.3.10.0.dylib"

This seems to indicate that there was an error in the autodetect_interface function in LBT that tries to determine if it's a 32 or 64 bit library.

I've tried uping LAPACK_jll and running Pkg.instantiate() but no joy. I assume this is some kind of upstream issue with artifacts, packages, or LBT, or maybe the build of the LAPACK lib?

Any help would be appreciated. I am not on the 13.4 beta with the fix for dsptrf so my understanding is that I need to use this external LAPACK lib with Accelerate BLAS

This on the head of sf/ilp64_accelerate, commit d05a891

ViralBShah commented 1 year ago

@Moblin88 This works for me. I just pushed an update for LAPACK 3.11 as well, and made that the minimum. Can you try it out?

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 82.50% and project coverage change: +2.54 :tada:

Comparison is base (c5186a7) 80.26% compared to head (729a176) 82.81%.

:exclamation: Current head 729a176 differs from pull request most recent head e3753ce. Consider uploading reports for the commit e3753ce to get more accurate results

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #58 +/- ## ========================================== + Coverage 80.26% 82.81% +2.54% ========================================== Files 4 4 Lines 152 192 +40 ========================================== + Hits 122 159 +37 - Misses 30 33 +3 ``` | [Impacted Files](https://app.codecov.io/gh/JuliaMath/AppleAccelerate.jl/pull/58?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaMath) | Coverage Δ | | |---|---|---| | [src/AppleAccelerate.jl](https://app.codecov.io/gh/JuliaMath/AppleAccelerate.jl/pull/58?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaMath#diff-c3JjL0FwcGxlQWNjZWxlcmF0ZS5qbA==) | `82.92% <82.50%> (-17.08%)` | :arrow_down: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/JuliaMath/AppleAccelerate.jl/pull/58/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaMath)

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

ViralBShah commented 1 year ago

@staticfloat I have reinstated the earlier capabilities in this package and would like to merge this PR, if it looks good to you. The DSP and Array functions do not bring additional package dependencies, so are perhaps ok to leave here for now.

We can refactor this into more packages later, but removing the code felt like we would forget about it. It works fine and passes tests, and hopefully will help others build further.

Moblin88 commented 1 year ago

It's working for me now on the master branch that was just merged with LAPACK 3.11.0. It's also WAYY faster to multiply large dense matrices!

ViralBShah commented 1 year ago

We will be able to remove the LAPACK dependency once macos 13.4 is out.

amontoison commented 1 year ago

Is it possible to do a new release of AppleAccelerate.jl?

ViralBShah commented 1 year ago

My preference is to wait for macos 13.4 and remove the lapack dependency and then make a release. Would you prefer sooner?

amontoison commented 1 year ago

No that's fine. I just wanted to add a comment about AppleAccelerate.jl in the documentation of JuliaHSL and explained that using AppleAccelerate loads an LP64 BLAS/LAPACK like using MKL.