fommil / netlib-java

:rocket: High Performance Linear Algebra (low level)
Other
658 stars 168 forks source link

Delegating BLAS #50

Closed fommil closed 11 years ago

fommil commented 11 years ago

The new GPU libraries (cuBLAS, clBLAS) do not actually implement CBLAS: they implement BLAS with non-standard prefixes (and I'm unsure about the use from Fortran... certainly not binary compatible with LAPACK). Although this allows users to specifically use the GPU, it is impractical to expect users - and the many tiers of middleware - to implement source code changes

In addition, it is clear to see that GPU acceleration is actually slower for small arrays (unless batched, which is a non-trivial departure from the BLAS API)

A more practical solution would be to create a libblas (implementing BLAS and then wrapping with CBLAS) that delegates to the correct implementation at runtime. The deciding factors in choosing an implementation (ATLAS vs clBLAS) for each routine could be calculated imperically on a per-machine basis and saved into a config file that allows the delegating lib to decide based on its parameters (e.g. array size)

From a C perspective, I do not know how to load a library containing methods of the same name as those we are implementing. There might need to be some dynamic library loading jiggery pokery.

From a Java perspective, this library would look identical to libblas and therefore no code changes would be necessary. Note that we cannot workaround the issue of name collisions, because the native LAPACK and ARPACK need to be able to call correctly named BLAS.

fommil commented 11 years ago

dlopen/dlsym and LoadLibrary/GetProcAddress are my friend

fommil commented 11 years ago

This deserves its own project: https://github.com/fommil/multiblas