Support Apple Accelerate and improve MKL integration

SciML / LinearSolve.jl

LinearSolve.jl: High-Performance Unified Interface for Linear Solvers in Julia. Easily switch between factorization and Krylov methods, add preconditioners, and all in one interface.

https://docs.sciml.ai/LinearSolve/stable/

Other

248 stars 53 forks source link

Support Apple Accelerate and improve MKL integration #355

Closed ChrisRackauckas closed 1 year ago

ViralBShah commented 1 year ago

Would you ever pass ipiv to aa_getrf? If so, that probably comes in as an Int64 vector and probably should be downcast to an Int32 vector.

ViralBShah commented 1 year ago

I believe the smarter thing to do is to use NEWLAPACK symbols if available, falling back to the old one if not. Not sure if getrf is better in that or the same. In case the new LAPACK library has a faster LU, this is worth the effort. Some benchmarking is necessary first.

cc @vpuri3

ChrisRackauckas commented 1 year ago

Would you ever pass ipiv to aa_getrf? If so, that probably comes in as an Int64 vector and probably should be downcast to an Int32 vector.

We allocate all of the caches so it's safe from that. It's setup so that all caches compile up front and repeated calls then reuse the cache now, even for the info ref.

codecov[bot] commented 1 year ago

Codecov Report

Merging #355 (6d5aeb4) into main (464156c) will increase coverage by 47.93%. The diff coverage is 17.74%.

@@             Coverage Diff             @@
##             main     #355       +/-   ##
===========================================
+ Coverage   25.75%   73.68%   +47.93%     
===========================================
  Files          18       19        +1     
  Lines        1254     1353       +99     
===========================================
+ Hits          323      997      +674     
+ Misses        931      356      -575

Files Changed	Coverage Δ
src/LinearSolve.jl	`90.90% <ø> (+15.90%)`	:arrow_up:
src/appleaccelerate.jl	`7.27% <7.27%> (ø)`
ext/LinearSolveMKLExt.jl	`92.30% <100.00%> (+92.30%)`	:arrow_up:

... and 15 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

ChrisRackauckas commented 1 year ago

I believe the smarter thing to do is to use NEWLAPACK symbols if available, falling back to the old one if not. Not sure if getrf is better in that or the same. In case the new LAPACK library has a faster LU, this is worth the effort. Some benchmarking is necessary first.

If I did it correctly, it doesn't seem that big of a deal in https://github.com/SciML/LinearSolve.jl/pull/358

ViralBShah commented 1 year ago

That seems correctly done. Unless they explicitly multi-threaded the getrf call, all the performance actually just comes from the matmul. openblas does have the natively multi-threaded getrf but I think it doesn't have access to the fast matmul kernels on M-series that Accelerate does: https://github.com/xianyi/OpenBLAS/blob/develop/lapack/getrf/getrf_parallel.c

I suppose the only benefit of the 64-bit version is that you don't have to convert the ipiv vector to 64-bit on return and save one small allocation.

ChrisRackauckas commented 1 year ago

I just cached the 32-bit version so the allocation is saved anyways. So yeah, seems like it's better to just support 32-bit there.