JuliaLinearAlgebra / libblastrampoline

Using PLT trampolines to provide a BLAS and LAPACK demuxing library.
MIT License
66 stars 17 forks source link

Support Apple Accelerate #16

Closed staticfloat closed 3 years ago

staticfloat commented 3 years ago

MacOS provides Accelerate.framework, which we can use to provide BLAS on MacOS and especially on apple silicon. Unfortunately there seems to be some ABI differences specifically related to return codes on these libraries. Since we want to provide the gfortran-compatible interface to all client code, we'll need to add some return code-altering shims and forward to those first, providing some small argument/return code massaging.

We should be able to auto-detect this similarly to how we auto-detect bitwidth; we can call an effected function (such as sdot) and look at the return code to see how the return codes are being passed back.

X-ref: https://github.com/tenomoto/dotwrp

ViralBShah commented 3 years ago

While this would be nice - I will note that Apple LAPACK is too old for Julia. Thus we have to compile LAPACK separately, and link against Accelerate. We still don't get all the nice multi-threaded LAPACK that other libraries like openblas and MKL have.

It is still a good idea to be able to use Accelerate, especially on Apple silicon.

simonbyrne commented 3 years ago

It might be worth filing some bug reports with Apple: https://twitter.com/stephentyrone/status/1293162645374349312?s=20

What LAPACK functions are multithreaded in OpenBLAS (as opposed to calling a multithreaded BLAS)?

ViralBShah commented 3 years ago

These ones: https://github.com/xianyi/OpenBLAS/tree/develop/lapack

MKL may do more (or not). It's hard to know.

But Apple uses an f2c'ed LAPACK from many years ago. Not sure if they will be convinced - given they don't even want to maintain a fortran compiler.

ViralBShah commented 3 years ago

Where does one even report such a bug?

staticfloat commented 3 years ago

It would be good to get a simple benchmark script and run it on x86_64 and aarch64 doing a three-way comparison of BLAS/LAPACK on OpenBLAS, Accelerate and MKL (two-way on aarch64, obviously).

This is all so laughably easy now that we've got LBT, that it shouldn't take more than 10 minutes to write up the test suite with some appropriate lbt_forward() calls.

I think we only have a shot at getting Apple to pay attention if we show something conclusive, like "eigs() is 10x slower than OpenBLAS on x86_64" or something. Maybe not even then, but it's worth a shot.

ViralBShah commented 3 years ago

First I have to get myself one of those M1 macs...

staticfloat commented 3 years ago

First I have to get myself one of those M1 macs...

Unless you really want to be an early-adopter, I suggest holding off until we rebuild Yggdrasil. We still don't have a great way to do that, so it's going to be a few months before things are truly smooth, I think.

simonbyrne commented 3 years ago

Where does one even report such a bug?

I guess https://developer.apple.com/bug-reporting/

staticfloat commented 3 years ago

Yeah, you file it all in Feedback Assistant. If you have something actionable, I can pass the FB number to our developer liaison at Apple who seems to be doing a good job of passing it on to the relevant internal teams at Apple.