Idein / qmkl6

BLAS library for VideoCore VI QPU (Raspberry Pi 4)
BSD 3-Clause "New" or "Revised" License
66 stars 7 forks source link

Details on how QMKL6 can be used with standard libraries #4

Closed kovaszab closed 3 years ago

kovaszab commented 3 years ago

Hello guys! I have succesfully installed the package and ran all tests with success. Now I'm stuck with how it can be used with libraries that uses blas like for example Numpy. What should i do with the compiler flags? How to provide the flag when compiling Numpy source? Searched the Numpy doc with no luck. Could you guys please demistify this part?

And last but not least, I want to thank you for all your work with making raspberry pi gpu programming more avaible for the public. You guys rock!

Terminus-IMRC commented 3 years ago

Thank you for the interest!

You can use QMKL6 as a substitution for other ones as long as you allocate memory with mkl_malloc function instead of malloc. I think Python fundamentally allocates memory with malloc, so it looks hard to use QMKL6 with NumPy (extensive hacks on memory allocation system will be needed).

However, if you can modify your C/C++ code to allocate memory with the mkl_malloc function, then it will work. Here you can build your program with cc $(pkg-config --cflags --libs blas-qmkl6) source.c after installing the QMKL6 .deb package (see README for the instruction).

kovaszab commented 3 years ago

Thank you for the quick response! Just wanted to it do in Python with Numpy because of quick development, but I suppose then it's more optimalistic to just write C code for the parts that would rely on it for my use case. Anyway, still very helpful information, Thanks a lot!

Terminus-IMRC commented 3 years ago

I too hope it can be done, but BLAS libraries actually do not have the exact same interface (e.g. Intel MKL and OpenBLAS havecblas_somatcopy but Netlib BLAS and ATLAS do not). NumPy seems to perform some workarounds to support various BLAS libraries, which is why porting QMKL6 to NumPy is not an easy task.

Please refer to Intel's MKL document or our test programs for the usage of BLAS functions.

QMKL6 is made to have the same interface as Intel MKL, which also has the mkl_malloc function. If you want to debug your program also on an x86/x86_64 processor, then you can use Intel OneAPI Toolkit, which is recently made freely available.

Terminus-IMRC commented 3 years ago

Closing. Thank you for letting us know about an application of QMKL6!