clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

Bug fixes to AutoGEMM and DTRSM, DTRTRI #221

Closed pavanky closed 8 years ago

pavanky commented 8 years ago
pavanky commented 8 years ago

@kknox @TimmyLiu This PR fixes all the bugs we are seeing in release mode with arrayfire. There is still another bug that only occurs in debug mode that we are investigating with a lower priority.

TimmyLiu commented 8 years ago

@pavanky thanks! Did you have a chance to do some performance benchmark? I am nervous about adding if statements within the kernel. It may add more cycles. But I am not sure since all thread (beta==0 or not ) should execute the same path.

pavanky commented 8 years ago

@TimmyLiu We haven't explicitly benchmarked the code, but I don't think the if conditions add overhead because all threads take the same execution path. There is no thread divergence.

TimmyLiu commented 8 years ago

@pavanky I agree. Let me double check that it doesn't add more registers either. The performance can be sensitive to register count as well.