amd / OpenCL-caffe

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl
Other
517 stars 152 forks source link

runtest: Failed to build program #43

Closed aelnouby closed 8 years ago

aelnouby commented 8 years ago

I can't get my GPU to work, I have AMD Radeon HD 7500M/7600M Series GPU

Clinfo output : https://gist.github.com/aelnouby/6cf790fbbdfd56b0727dee9fa24ac2c3

make runtest : https://gist.github.com/aelnouby/dff602e1133f5f78ac0a54bb2223a69c

set_mode_gpu() : https://gist.github.com/aelnouby/d23d417b70ab8d3d6c4b4815f661751d

smistad commented 8 years ago

Had the same problem when building from a separate build directory.

Changing std::string oclKernelPath = "./src/caffe/ocl/"; to std::string oclKernelPath = "./../src/caffe/ocl/";

in src/caffe/device.cpp fixed the problem, however the path should be set in better way.

aelnouby commented 8 years ago

I did something very similar to fix Err: Open ocl dir failed! , i explicitly set caffe root instead of .

However this didn't seem to fix the Failed to build program error

aelnouby commented 8 years ago

@gujunli Could you help me with this, please ?

gujunli commented 8 years ago

Hi Yibing,

Could you take a look at these path problems?

Sorry guys for the late reply. just a notice that we no longer work for AMD. I dont think we can still maintain the project for AMD. But this is an interesting project. We might move the folder to our personal github to maintain it as an open source project.

Thanks a lot! Junli

On May 4, 2016, at 3:22 AM, Alaa El-Nouby notifications@github.com wrote:

@gujunli https://github.com/gujunli Could you help me with this, please ?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/amd/OpenCL-caffe/issues/43#issuecomment-216822175

kuke commented 8 years ago

@smistad When building and running caffe, you are assumed to be at the ROOT directory, which can be verified by the path settings in other *.sh scripts. @aelnouby Besides the building location, have you made sure that the ocl dir is not missing and readable?

aelnouby commented 8 years ago

@kuke I believe yes it is not missing and readable, please tell me if there is a way to check.

I think the problem is in the line cl_int iStatus = clBuildProgram(Program, 1, pDevices, buildOption.c_str(), The output is -11 which is CL_BUILD_PROGRAM _FAILURE

I honestly don't know what this means exactly.

kuke commented 8 years ago

@aelnouby your logs indicates that the device worked properly, so as the access of ocl files. And the failure of building program very likely results from the syntax errors in *.cl files, so haven't you changed the ocl source code intentionally or unintentionally?

aelnouby commented 8 years ago

I actually removed everything, and made a fresh clone to the repo, the exact same issue happened.

Some information that might or might not be useful, before installing opencl-caffe , i was using regular caffe, so i don't know if this might cause any issues concerning env variables and stuff like this.

Also if i use sudo clinfo i get the CPU only, no GPU is reported.

kuke commented 8 years ago

Maybe your AMD driver has some problems, have you ever successfully built a simple OpenCL program?

aelnouby commented 8 years ago

I onyl tried the HelloWorld in /opt/AMDAPPSDK-3.0/samples/opencl/bin/x86_64 directory, and it worked fine. Could you tell me anything else to try ?

aelnouby commented 8 years ago

Just now i have tried this example by @smistad

Ouput of gcc -I /opt/AMDAPPSDK-3.0/include -L/opt/AMDAPPSDK-3.0/lib/x86_64/ -o main main.c -Wl,-rpath,/opt/AMDAPPSDK-3.0/lib/x86_64/ -lOpenCL

main.c: In function ‘main’: main.c:50:5: warning: ‘clCreateCommandQueue’ is deprecated [-Wdeprecated-declarations] cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret); ^ In file included from main.c:7:0: /opt/AMDAPPSDK-3.0/include/CL/cl.h:1359:1: note: declared here clCreateCommandQueue(cl_context /* context */,

Output of ./main :

0 + 1024 = 1024 1 + 1023 = 1024 2 + 1022 = 1024 3 + 1021 = 1024 4 + 1020 = 1024 5 + 1019 = 1024 6 + 1018 = 1024 7 + 1017 = 1024 8 + 1016 = 1024 9 + 1015 = 1024 10 + 1014 = 1024 11 + 1013 = 1024 12 + 1012 = 1024 13 + 1011 = 1024 ............... till the end

Which i beleive is the correct behaviour

kuke commented 8 years ago

It seems quite weird. We just tested OpenCL caffe on several server-end GPUs. I can't give you a reasonable explanation yet.

aelnouby commented 8 years ago

@kuke Thanks a lot for your time, so are there any recommendation concerning this situation, reinstall something or may be try a different OS or any other thing, i am in a desperate need for this.

kuke commented 8 years ago

I think of one important thing. You installed the CUDA SDK on the machine when using the regular, right? You'd better remove it cleanly and reinstall the AMD driver.

aelnouby commented 8 years ago

I didnt install CUDA SDK, i was using CPU ONLY option

aelnouby commented 8 years ago

I actually have installed opencl-catalyst and opencl-headers from AUR

The file in /etc/OpenCL/vendors is amdocl64.icd

I don't know if these information are relevant.

kuke commented 8 years ago

Yes. It is possible that the open source OpenCL runtime doesn't support the build options. You can try the official driver from AMD.

Noplz commented 8 years ago

@aelnouby We didn't test on the Arch before, but I think the problem is on the OpenCL header files. You can try to download and reinstall everything about OpenCL from the AMD Official Website instead from AUR.

pttypn commented 8 years ago

I have the same problems with an AMD Radeon HD 6570. I'm using fglrx 15.2, AMDAPPSDK-3.0, and ACML6 all from the AMD website. I also tried using AMDAPP-2.9.1 and ACML-5.3.1, but recieved the same sorts of errors. It might be a problem with clBLAS, or clBLAS is running into the same problem as caffe. I tried a bunch of different clBLAS versions (both from source and the binaries), and test-functional always fails while running the ERROR tests.

clinfo: https://gist.github.com/patmarks/f6a47f9db528d33a0ab6def34ca4c89b Here's the result of running: ./build/test/test.testbin -alsologtostderr=1 from the caffe root directory.: https://gist.github.com/patmarks/39bb7e150bee0c1256efc44da498b911

I was also able to build the simple example by by @smistad (although I had to modify the makefile so that gcc uses '-L' to find libOpenCL.so).

@aelnouby what happens when you run test-functional? Mine is aborted while running the ERROR tests. https://gist.github.com/patmarks/3992f43f981c2b7759165e55394e36da

OpenCL error -11 on line 244 of /home/pm/Documents/jupyter/opencl/clBLAS1/src/library/blas/xgemm.cc test-functional: /home/pm/Documents/jupyter/opencl/clBLAS1/src/library/blas/xgemm.cc:244: void makeGemmKernel(_cl_kernel, cl_commandqueue, const char, const char_, const unsigned char, sizet, const char_): Assertion `false' failed. Aborted

gstoner commented 8 years ago

This was experimental branch of Caffe for OpenCL, we know recommend you use the now official OpenCL port of Caffe in BVLC GitHub Repo at https://github.com/BVLC/caffe/tree/opencl