Open Ivorforce opened 2 months ago
I implemented an axes supporting reduce_dot
simply using a caching multiply and sum.
It's not the best implementation, but it's short and uses almost no binary size (plus it's consistently a bit faster than using the nd.
API for the calls separately).
With xtensor-blas, we can still accelerate when available, but this is a good start.
xtensor-blas is header only, but it does require a BLAS binary.
I think it should be possible to implement BLAS related functions with a library soft bind: On launch, we look if it's locally or globally installed. If yes, we bind it. At runtime, we can check if it's bound, and if yes, we can make use of those methods for acceleration (or primary implementation).
If someone wants to use it they can pop the binaries into the NumDot location manually (though we should write a tutorial how).
Xtensor has support for linear algebra functions through xtensor-blas.
Unfortunately, this requires BLAS and LAPack binaries. Accordingly, it should probably only be part of a "wide scope" download.
Another problem is that (as they say) broadcasting is not fully supported for most of them yet. But I suppose that's ok, it will crash and people can use the broadcast function (#14). Better yet, we check and broadcast ourselves in NumDot if needed.