clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
843 stars 237 forks source link

Error compiling during run-time #293

Closed aml5600 closed 4 years ago

aml5600 commented 7 years ago

I have been using clBLAS for about a month and have never had this issue until today:

========================================================

AN INTERNAL KERNEL BUILD ERROR OCCURRED! device name = GeForce GT 730M error = -9999 memory pattern = Register accumulation based swap, computing kernel generator Subproblem dimensions: dims[0].itemY = 32, dims[0].itemX = 32, dims[0].y = 32, dims[0].x = 32, dims[0].bwidth = 4; ; dims[1].itemY = 4, dims[1].itemX = 4, dims[1].y = 4, dims[1].x = 4, dims[1].bwidth = 4; ; Parallelism granularity: pgran->wgDim = 1, pgran->wgSize[0] = 64, pgran->wgSize[1] = 1, pgran->wfSize = 64 Kernel extra flags: 26784 Source:

ifdef DOUBLE_PRECISION

#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#else
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#endif

endif

kernel void Daxpy_kernel( double alpha, __global double *_X, global double _Y, uint N, uint offx, int incx, uint offy, int incy ) { __global double X = _X + offx; __global double *Y = _Y + offy;

if ( incx < 0 ) {
    X = X + (N - 1) * abs(incx);
}
if ( incy < 0 ) {
    Y = Y + (N - 1) * abs(incy);
}

int gOffset;
for( gOffset=(get_global_id(0) * 2); (gOffset + 2 - 1)<N; gOffset+=( get_global_size(0) * 2 ) )
{
    double2 vReg1, vReg2;

    #ifdef INCX_NONUNITY
         vReg1 = (double2)(  (X + (gOffset*incx))[0 + ( incx * 0)],  (X + (gOffset*incx))[0 + ( incx * 1)]);
    #else
        vReg1 = *((__global double2*)( (X + gOffset) ));
    #endif

    #ifdef INCY_NONUNITY
         vReg2 = (double2)(  (Y + (gOffset*incy))[0 + ( incy * 0)],  (Y + (gOffset*incy))[0 + ( incy * 1)]);
    #else
        vReg2 = *((__global double2*)( (Y + gOffset) ));
    #endif

     vReg2 = mad( alpha, vReg1 , vReg2);

    #ifdef INCY_NONUNITY
          (Y + (gOffset * incy))[0 + ( incy  * 0)] =  vReg2.S0;

(Y + (gOffset incy))[0 + ( incy 1)] = vReg2.S1; ;

else

        *((__global double2*)((Y + (gOffset * incy)) ) +  0 ) =  vReg2;
    #endif
}

// Loop for the last thread to handle the tail part of the vector
// Using the same gOffset used above
for( ; gOffset<N; gOffset++ )
{
    double sReg1, sReg2;
    sReg1 = X[gOffset * incx];
    sReg2 = Y[gOffset * incy];

     sReg2 = mad( alpha, sReg1 , sReg2);
    Y[gOffset * incy] = sReg2;
    }

}


Build log:

========================================================

Seeing as there is no build log it is difficult to know where to begin searching for an error. Additionally, I cannot seem to find what error -9999 refers to.

Any help would be greatly appreciated!

tingxingdong commented 7 years ago

If there is no build log and you do not know error 0999 referring to? Then how do you know it is Daxpy_kernel failing? Why do you put this kernel here.