clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
842 stars 240 forks source link

clBLAS-client --cpu gives CL_INVALID_COMMAND_QUEUE error on OS X #187

Open GOFAI opened 8 years ago

GOFAI commented 8 years ago

I built clBLAS 2.8 on my 13" MacBook Pro (Retina, late 2014, 10.10.5) using a Homebrew formula I tweaked from the one in the homebrew-science tap. While everything seemed to install fine I get the following error when I try to check it with clBLAS-client:

JOHNNIAC:clBlas Walrus$ clBLAS-client --cpu
OpenCL error -36 on line 350 of /tmp/clblas20151115-2237-z21tyj/clBLAS-2.8/src/library/blas/xgemm.cc
Assertion failed: (false), function clblasGemm, file /tmp/clblas20151115-2237-z21tyj/clBLAS-2.8/src/library/blas/xgemm.cc, line 350.
Abort trap: 6

-36 is apparently CL_INVALID_COMMAND_QUEUE. Any idea what's going awry here?

GOFAI commented 8 years ago

clBLAS-client --gpu gives the exact same message, fwiw.

hughperkins commented 8 years ago

(Just in case it's useful, invalid_command_queue usually means that the kernel read/write outside the bounds of an array.)

GOFAI commented 8 years ago

Line 350 of /tmp/clblas20151115-2237-z21tyj/clBLAS-2.8/src/library/blas/xgemm.cc:

err = clGetCommandQueueInfo( commandQueues[0], CL_QUEUE_DEVICE, sizeof(clDevice), &clDevice, NULL);
  CL_CHECK(err)

I suppose commandQueues[0] doesn't point to a valid command-queue. It also turns out that functions other than gemm get further along before also ending up with a CL_INVALID_COMMAND_QUEUE:

JOHNNIAC:glasstone Walrus$ clblas-client -f gemv
    StatisticalTimer:: Pruning 1 samples from clfunc
    StatisticalTimer:: Pruning 0 samples from clGemv
BLAS kernel execution time < ns >: 38210.5
BLAS kernel execution Gflops < 2.0*M*N/time >: 0.0857565
OPENCL_V_THROWERROR< CL_INVALID_COMMAND_QUEUE > (274): releasing command queue
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: OPENCL_V_THROWERROR< CL_INVALID_COMMAND_QUEUE > (274): releasing command queue
Abort trap: 6
hughperkins commented 8 years ago

It's normally because you wrote off the end of an array. You can have a look at https://github.com/clMathLibraries/clBLAS/issues/108 for an example of how to debug this.

Edit: but basically the concept is:

Edit 2: recommend turning off opencl optimizations, there is a paragraph in the linked issue, giving an example of how to do this.

GOFAI commented 8 years ago

Thanks! I'm kind of hoping from the problems I'm having that it's not any single kernel, but rather a single problem that's affecting everything.

My impression from scrutinizing clfunc_common.hpp is that the "releasing command queue" message will only appear if the error occurred when the destructor was called on clblasfunc. (I get a variant of that message for all functions BUT gemm).

If that's the case, does it make sense to try to go through all the kernels inserting return;?

hughperkins commented 8 years ago

My impression from scrutinizing clfunc_common.hpp is that the "releasing command queue" message will only appear if the error occurred when the destructor was called on clblasfunc. (I get a variant of that message for all functions BUT gemm).

Ah, interesting. Fair enough :-)