BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.12k stars 18.69k forks source link

Failed tests after installation of OpenCL branch on Linux; ViennaCL: Could not find kernel #6258

Open vishwanathkr opened 6 years ago

vishwanathkr commented 6 years ago

Hello,

I have been trying to get a working build of Caffe-OpenCL on Linux with Intel-OpenCL integrated GPU. I built the latest from the branch and followed instructions from #5099. I am failing many tests after successfully building it.

Steps to Reproduce

make runtests -j8 I am failing many tests with errors of the following nature:

ViennaCL: FATAL ERROR: Could not find kernel 'max_pool_forward_float' from program '' ViennaCL: FATAL ERROR: Could not find kernel 'im2col_double' from program '' ... ... ...

Here is the Makefile.config (Relevant parameters only), that I followed to build. I enable Intel_Spatial, ViennaCL, libDNN, opencv_3.1.

Some outputs to help debug:

  1. ./build/test/test_all.testbin --gtest_filter=OpenCLKernelCompileTest 0
  2. clinfo
  3. ./build/tools/caffe device_query

System configuration

Operating system: Ubuntu 16.04.3 Compiler: g++ - 5.4 CUDA version (if applicable): NA CUDNN version (if applicable): NA BLAS: Atlas Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7.13 CPU: Intel i7-7700 hq (7th gen) GPU: Intel HD Graphics, NVIDIA 1050 (Abstaining from use)

naibaf7 commented 6 years ago

What GPU drivers are you running for your iGPU? Maybe updating that or using beignet might help. It could also be that the tests just fail due to double data type not being supported. What does the whole runtest log look like? Have you tried to run any actual networks?

vishwanathkr commented 6 years ago

The iGPU and OpenCL drivers installed are the ones from Intel. Here is the entire runtest-log. I haven't run any network yet.

naibaf7 commented 6 years ago

Ok the driver is definitely not working as it should. Try beignet drivers instead.

vishwanathkr commented 6 years ago

@naibaf7 I tried with the beignet drivers and after fiddling around with a few things, got it to work. Here is the runtest-log. A few tests still fail and I would like your thoughts on that.

naibaf7 commented 6 years ago

@vishwanathkr I'm only concerned about SyncedMemoryTest.TestGPURead, this one should not fail. It could mean issues when copying results back from the GPU; confirm this one by running a real network.

Same for the TestReshape. But if you're not using double precision, ignore. This is an odd one, since it's a CPU test (should really not fail). The LibDNN error is probably because the LibDNN backward deconvolution is numerically unstable on your GPU; ignore it for now. All common networks should still work. TestSharedWeightsUpdate can be ignored; no multi-GPU supported anyways.