Closed ChrisRackauckas closed 1 year ago
I believe the smarter thing to do is to use NEWLAPACK symbols if available, falling back to the old one if not. Not sure if getrf
is better in that or the same. In case the new LAPACK library has a faster LU, this is worth the effort. Some benchmarking is necessary first.
cc @vpuri3
Would you ever pass ipiv to aa_getrf? If so, that probably comes in as an Int64 vector and probably should be downcast to an Int32 vector.
We allocate all of the caches so it's safe from that. It's setup so that all caches compile up front and repeated calls then reuse the cache now, even for the info ref.
Merging #355 (6d5aeb4) into main (464156c) will increase coverage by
47.93%
. The diff coverage is17.74%
.
@@ Coverage Diff @@
## main #355 +/- ##
===========================================
+ Coverage 25.75% 73.68% +47.93%
===========================================
Files 18 19 +1
Lines 1254 1353 +99
===========================================
+ Hits 323 997 +674
+ Misses 931 356 -575
Files Changed | Coverage Δ | |
---|---|---|
src/LinearSolve.jl | 90.90% <ø> (+15.90%) |
:arrow_up: |
src/appleaccelerate.jl | 7.27% <7.27%> (ø) |
|
ext/LinearSolveMKLExt.jl | 92.30% <100.00%> (+92.30%) |
:arrow_up: |
... and 15 files with indirect coverage changes
:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more
I believe the smarter thing to do is to use NEWLAPACK symbols if available, falling back to the old one if not. Not sure if getrf is better in that or the same. In case the new LAPACK library has a faster LU, this is worth the effort. Some benchmarking is necessary first.
If I did it correctly, it doesn't seem that big of a deal in https://github.com/SciML/LinearSolve.jl/pull/358
That seems correctly done. Unless they explicitly multi-threaded the getrf
call, all the performance actually just comes from the matmul. openblas does have the natively multi-threaded getrf
but I think it doesn't have access to the fast matmul kernels on M-series that Accelerate does: https://github.com/xianyi/OpenBLAS/blob/develop/lapack/getrf/getrf_parallel.c
I suppose the only benefit of the 64-bit version is that you don't have to convert the ipiv vector to 64-bit on return and save one small allocation.
I just cached the 32-bit version so the allocation is saved anyways. So yeah, seems like it's better to just support 32-bit there.
Would you ever pass
ipiv
toaa_getrf
? If so, that probably comes in as an Int64 vector and probably should be downcast to an Int32 vector.