Closed inferrna closed 8 years ago
I cannot reproduce your error on my end (mostly because I did not build Caffe). One thing I am (pretty) sure is that with 1x1 matrix the above kernel won't be used. Anyway, can you give me a set of parameters to sgemm that causes this error?
I also got failed test RowMajor_SmallRange/GEMM.sgemm/0 https://gist.github.com/inferrna/230cf27802cfde04bf58 With same and other errors.
Sounds like OpenCL compiler threw the errors. It is not test fail per se. But the compiler does not like the kernel generated. Is this an Intel graphic card?
Is this an Intel graphic card?
Yes, it is Beignet at Haswell i5-4200 GPU
Same error, NVIDIA 940M
Some of the errors I get:
OpenCL error -11 on line 163 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
clBuildProgram Failed
err = -11
Error: Failed to build program executable!
Build Log:
<kernel>:63:65: error: array initializer must be an initializer list
DATA_TYPE_STR rC[MICRO_TILE_NUM_ROWS][MICRO_TILE_NUM_COLS] = {0};
^
OpenCL error -45 on line 187 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -48 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -48 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -48 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
And the test code:
TEST(testClBlas, colMajorTransB) {
EasyCL *cl = EasyCL::createForFirstGpuOtherwiseCpu();
float A[] = {1, 3,
2, 7,
9, 5};
float B[] = {3,
-1};
float C[3];
transpose(A, 3, 2);
ClBlasInstance clblasInstance;
CLWrapper *AWrap = cl->wrap(6, A);
CLWrapper *BWrap = cl->wrap(2, B);
CLWrapper *CWrap = cl->wrap(3, C);
AWrap->copyToDevice();
BWrap->copyToDevice();
ClBlasHelper::Gemm(
cl,
clblasColumnMajor,
clblasNoTrans, clblasTrans,
3, 2, 1,
1,
AWrap, 0,
BWrap, 0,
0,
CWrap, 0
);
CWrap->copyToHost();
transpose(C, 1, 3);
EXPECT_EQ(0, C[0]);
EXPECT_EQ(-1, C[1]);
EXPECT_EQ(22, C[2]);
delete CWrap;
delete BWrap;
delete AWrap;
delete cl;
}
It's caused by line 213 of zgemm_gcn.cl:
// registers
DATA_TYPE_STR rC[MICRO_TILE_NUM_ROWS][MICRO_TILE_NUM_COLS] = {0};
Actually, I think it's in library/blas/AutoGemm, line 221:
" DATA_TYPE_STR rC[MICRO_TILE_NUM_ROWS][MICRO_TILE_NUM_COLS] = {0};" + endLine +
But changing this to:
" DATA_TYPE_STR rC[MICRO_TILE_NUM_ROWS][MICRO_TILE_NUM_COLS];" + endLine +
... I still get incorrect results, and some errors, eg:
[ RUN ] testClBlas.colMajor
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
initializing clblas
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -52 on line 239 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
/home/user/git/DeepCL/test/testClBlas.cpp:215: Failure
Value of: C[0]
Actual: 0.0320347
Expected: 0
/home/user/git/DeepCL/test/testClBlas.cpp:216: Failure
Value of: C[1]
Actual: -0.0424018
Expected: -1
/home/user/git/DeepCL/test/testClBlas.cpp:217: Failure
Value of: C[2]
Actual: -0.0419845
Expected: 22
clblas teardown
[ FAILED ] testClBlas.colMajor (118 ms)
test case:
TEST(testClBlas, colMajor) {
EasyCL *cl = EasyCL::createForFirstGpuOtherwiseCpu();
float A[] = {1, 3,
2, 7,
9, 5};
float B[] = {3,
-1};
float C[3];
transpose(A, 3, 2);
transpose(B, 2, 1);
ClBlasInstance clblasInstance;
CLWrapper *AWrap = cl->wrap(6, A);
CLWrapper *BWrap = cl->wrap(2, B);
CLWrapper *CWrap = cl->wrap(3, C);
AWrap->copyToDevice();
BWrap->copyToDevice();
ClBlasHelper::Gemm(
cl,
clblasColumnMajor,
clblasNoTrans, clblasNoTrans,
3, 2, 1,
1,
AWrap, 0,
BWrap, 0,
0,
CWrap, 0
);
CWrap->copyToHost();
transpose(C, 1, 3);
EXPECT_EQ(0, C[0]);
EXPECT_EQ(-1, C[1]);
EXPECT_EQ(22, C[2]);
delete CWrap;
delete BWrap;
delete AWrap;
delete cl;
}
Ah, changing KernelOpenCL.py like this gets rid of the incorrect results for some tests:
" DATA_TYPE_STR rC[MICRO_TILE_NUM_ROWS][MICRO_TILE_NUM_COLS] = { {0} };" + endLine +
Successful test results:
[ RUN ] testClBlas.basic
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
initializing clblas
clblas teardown
[ OK ] testClBlas.basic (418 ms)
[ RUN ] testClBlas.transA
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
1 2 9
3 7 5
initializing clblas
clblas teardown
[ OK ] testClBlas.transA (205 ms)
[ RUN ] testClBlas.transB
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
3
-1
initializing clblas
clblas teardown
[ OK ] testClBlas.transB (6017 ms)
Crashed test:
[ RUN ] testClBlas.colMajor
Using NVIDIA Corporation , OpenCL platform: NVIDIA CUDA
Using OpenCL device: GeForce 940M
initializing clblas
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
OpenCL error -38 on line 232 of /home/user/git/DeepCL/clMathLibraries/clBLAS/src/library/blas/xgemm.cc
Segmentation fault
(Note: for the remaining crashed tests, these are fixed in pull https://github.com/clMathLibraries/clBLAS/pull/163 These other crashes are for a different reason than this issue https://github.com/clMathLibraries/clBLAS/issues/153, it's because autogemm kernels are global, not per-context, and teardown doesnt reinitialize them to 0 either)
Got error using caffe OpenCL port https://github.com/BVLC/caffe/pull/2610 Seems like clBLAS producing incorrect kernel. Error message:
Inside sgemm_Col_TN_B1_MX032_NX032_KX16_BRANCH_src kernel we see
Possible it caused because I tested it on data with input dimension 1x1 (trying to solve cosine problem) which is very non-standard fort most caffe and clBLAS using scenarios. Platforms tested - AMD and beignet (Intel® GPU)