clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

Problems with GemmSpecialCases #219

Open pavanky opened 8 years ago

pavanky commented 8 years ago

For example the following exists (where beta != 0):

But this does not (where beta == 0):

This is a problem when "C" is not initialized properly. For example when "C" is just allocated but not explicitly set to 0, sometimes the initial values can be NaN. Multiplying this with a 0 will still result in NaNs.This propagates NaNs to the outputs.

One could argue this is according to the blas spec, but we haven't noticed this behavior in other BLAS implementations.

pavanky commented 8 years ago

@TimmyLiu We can fix the issue and send in a PR, but we are not sure if we'd be comprehensive. Can you provide us a list of kernels of where only "B1" is implemented?

An easy solution would be to call clEnqueueFillBuffer for when beta == 0 in these calls.

pavanky commented 8 years ago

I just realized clEnqueueFillBuffer may not work because of strides.

TimmyLiu commented 8 years ago

hi @pavanky . I see. all the special kernels are here: https://github.com/clMathLibraries/clBLAS/tree/master/src/library/blas/AutoGemm/UserGemmKernelSources although the fastest way is to bypass the special kernels with beta is zero.