clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

routines require memory transfer to host when scalar arguments are on device #287

Open Ulfgard opened 8 years ago

Ulfgard commented 8 years ago

Hi,

I am rather new with the library. If I am missing a part of the API, please point me to the relevant documentation and close this. I have searched though the provided doxygen documentation but could not find it.

Assume I want to compute (I+vv^T)_x = x+ (v^Tx)_v

for alpha=v^Tx i can use clblasXdot which computes alpha in device memory. However the next clblasXaxpy call for x= x+ alpha*v requires alpha to be on the host. This adds 1) an additional memory transfer and 2) adds another synchronisation point because we have to wait until the memory transfer of alpha is done. This holds AFAIK for almost all API making it hard to use the library in an asynchronous way. The solution would be to provide an API like cblasXaxpy_async which takes the cl_mem parameters instead of scalars.