amd / OpenCL-caffe

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl
Other
517 stars 152 forks source link

No GPU Device #13

Open ghost opened 9 years ago

ghost commented 9 years ago

Clearly I have AMD GPU:

zhoub@zhoub:~$ lspci 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM] 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]

But it shows that:

xx@xx:~/OpenCL-caffe/build$ make runtest [ 1%] Built target proto [ 55%] Built target caffe [ 55%] Built target gtest [100%] Built target test.testbin Current device id: 0 F1006 20:10:38.865067 11580 device.cpp:75] Err: No GPU devices * Check failure stack trace: @ 0x2b5fc1facdaa (unknown) @ 0x2b5fc1facce4 (unknown) @ 0x2b5fc1fac6e6 (unknown) @ 0x2b5fc1faf687 (unknown) @ 0x2b5fc1941c44 caffe::Device::Init() @ 0x6f4e48 main @ 0x2b5fc3cc5ec5 (unknown) @ 0x6f7d12 (unknown) @ (nil) (unknown) Aborted (core dumped) make[3]: \ [src/caffe/test/CMakeFiles/runtest] Error 134 make[2]: * [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2 make[1]: * [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2 make: * [runtest] Error 2

hughperkins commented 9 years ago

What is output of clinfo?

ghost commented 9 years ago

xx@xx:~$ clinfo Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (1800.8) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 1002h Board name:
Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 8 Preferred vector width double: 4 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 3605Mhz Address bits: 64 Max memory allocation: 2147483648 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 32768 Global memory size: 8312074240 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 2147483648 Max global variable size: 1879048192 Max global variable preferred total size: 1879048192 Max read/write image args: 64 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities:
Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities:
Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties:
Out-of-Order: No Profiling : Yes Queue on Device properties:
Out-of-Order: No Profiling : No Platform ID: 0x7fda8a12f430 Name: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz Vendor: GenuineIntel Device OpenCL C version: OpenCL C 1.2 Driver version: 1800.8 (sse2,avx) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1800.8) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

hughperkins commented 9 years ago

From the clinfo output:

Device Type: CL_DEVICE_TYPE_CPU

That's a CPU, ie not a GPU, nor an integrated GPU, nor an APU. I guess that AMD Caffe only supports GPUs (maybe APUs?), though it would need either an AMD dev to confirm, or for someone to browse through the code a bit.

hughperkins commented 9 years ago

From the code, device.cpp:

  clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
  uiNumDevices = numDevices;
  if (0 == uiNumDevices) {
    LOG(FATAL) << "Err: No GPU devices";

The code is only looking for GPU devices, not CPU devices.

ghost commented 9 years ago

Thanks for your help, but you can see here:

zhoub@zhoub:~$ lspci 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240 OEM]

that clearly I have one GPU, why OpenCL cannot discover it automatically?

hughperkins commented 9 years ago

You're probably missing the matching OpenCL drivers for your GPU card, or there is some issue in their installation. Until the GPU appears in the clinfo output, AMD Caffe won't find it.

kuke commented 9 years ago

Apparently, clinfo cannot recognise your GPU device, you should fix that first, maybe you can try to reinstall the driver.

keryell commented 9 years ago

On the other hand it would be nice that Caffe supports any kind of device, especially useful for debugging. :-)

hughperkins commented 9 years ago

On the other hand it would be nice that Caffe supports any kind of device, especially useful for debugging. :-)

Well.... personally, for cltorch, initially I allowed all devices, but then I got tons of support requests caused by running on the CPU part, so I only show GPUs and APUs now.

ghost commented 9 years ago

I have reinstalled the AMD opencl2 driver, still cannot find GPU if I type clinfo

hughperkins commented 9 years ago

It's out of scope for Caffe I reckon, but anyway, if it was me, I would check the following things:

jgoldsAMD commented 9 years ago

Note that an APU is CPU + GPU, so if you have an APU, and you have the proper drivers, you will see two OpenCL devices: one for the CPU and one for the GPU.

From: Hugh Perkins [mailto:notifications@github.com] Sent: Tuesday, October 06, 2015 7:17 AM To: amd/OpenCL-caffe Subject: Re: [OpenCL-caffe] No GPU Device (#13)

From the code, device.cpp:

clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);

uiNumDevices = numDevices;

if (0 == uiNumDevices) {

LOG(FATAL) << "Err: No GPU devices";

AMD Caffe supports only GPUs. Not APUs, nor CPUs.

— Reply to this email directly or view it on GitHubhttps://github.com/amd/OpenCL-caffe/issues/13#issuecomment-145853379.

ghost commented 9 years ago

Thank you!

hughperkins commented 9 years ago

@jgoldsAMD Well, I meant CL_DEVICE_TYPE_ACCELERATOR. I think this device type is only used by certain Intel devices currently. I'm not sure if APU is the correct abbreviation for this; possibly I used the wrong abbreviation :-P

hughperkins commented 9 years ago

(seems like this device type is used by Intel Xeon Phi Coprocessors, eg https://software.intel.com/en-us/node/540596 https://software.intel.com/en-us/node/540573 )

jgoldsAMD commented 9 years ago

That makes sense. Perhaps the FPGA devices appear as ACCELERATORS as well, I don’t have any experience with them.

From: Hugh Perkins [mailto:notifications@github.com] Sent: Tuesday, October 06, 2015 9:44 AM To: amd/OpenCL-caffe Cc: Golds, Jeff Subject: Re: [OpenCL-caffe] No GPU Device (#13)

(seems like it is used by Intel Xeon Phi Coprocessors, eg https://software.intel.com/en-us/node/540596 https://software.intel.com/en-us/node/540573 )

— Reply to this email directly or view it on GitHubhttps://github.com/amd/OpenCL-caffe/issues/13#issuecomment-145905577.

gujunli commented 9 years ago

@hughperkins @jgoldsAMD We support two ways of selecting devices in the code. You can specify a device ID (can be GPU or APU). If no device ID is specified, we will find a GPU for you. Please refer to the following code. Do you have suggestions to do it better or easier for users? please let us know.

if (deviceId == -1) { int i; for (i = 0; i < (int) uiNumDevices; i++) { clGetDeviceInfo(pDevices[i], CL_DEVICE_HOST_UNIFIED_MEMORY, sizeof(cl_bool), &unified_memory, NULL); if (!unified_memory) { //skip iGPU //we pick the first dGPU we found pDevices[0] = pDevices[i]; device_id = i; LOG(INFO) << "Picked default device type : dGPU " << device_id; break; } } if (i == uiNumDevices) { LOG(FATAL) << "Cannot find any dGPU! "; } } else if (deviceId >= 0 && deviceId < uiNumDevices) { pDevices[0] = pDevices[deviceId]; device_id = deviceId; LOG(INFO) << "Picked device type : GPU " << device_id; } else { LOG(FATAL) << " Invalid GPU deviceId! "; }

gujunli commented 9 years ago

@jgoldsAMD Hi Jeff, Thanks very much for your reply. It is very nice to see you following this code. I guess there will be more questions about OpenCL drivers and such, which we will need you help.

gujunli commented 9 years ago

@zhoubinxyz Are you remotely accessing your machine? We once ran into the same issue when we remote access through a new created account. I think we might have some issues in remote access support and it seems at one time only one user can see the GPU. @jgoldsAMD do you know this issue and how to solve it?

jgoldsAMD commented 9 years ago

AFAIK, headless support was working with latest drivers. There was an issue that you had to be root to run on the GPU, but I believe that was addressed as well. You can try starting X to see if that makes a difference, but it shouldn’t be required.

From: Junli Gu [mailto:notifications@github.com] Sent: Tuesday, October 06, 2015 1:08 PM To: amd/OpenCL-caffe Cc: Golds, Jeff Subject: Re: [OpenCL-caffe] No GPU Device (#13)

@zhoubinxyzhttps://github.com/zhoubinxyz Are you remotely accessing your machine? We once ran into the same issue when we remote access through a new created account. I think we might have some issues in remote access support and it seems at one time only one user can see the GPU. @jgoldsAMDhttps://github.com/jgoldsAMD do you know this issue and how to solve it?

— Reply to this email directly or view it on GitHubhttps://github.com/amd/OpenCL-caffe/issues/13#issuecomment-145967928.

ghost commented 9 years ago

@gujunli No I am not using remote access. May I know what other information I can provide? It would help us a lot if we can make it run on this type of GPU.

hughperkins commented 9 years ago

@zhoubinxyz : assuming you have the latest drivers for your card installed, Jeff's and Junli's heads-up about remote access, use of root, and X on/off are probably good things to try I reckon. eg try various combinations of shut down x, or start x, use root or not, eg:

sudo service lightdm stop
ps -ef | grep X  # make sure no x server running
clinfo   #  does this show the GPU?
sudo clinfo   # this?

Edit: I remember now that on my own amd-box:

ghost commented 9 years ago

@hughperkins Hi, After stop lightdm, I have login on tty ps -ef | grep X shows that: xx 2446 2303 0 13:11 tty1 00:00:00 grep --color=auto X clinfo shows the same information as before,

But sudo clinfo shows that: clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory

jgoldsAMD commented 9 years ago

Since you are seeing the CPU device, I can only assume that the library path is not the same for root as for your user as clinfo relies on the OpenCL lib being present.

Which graphics driver have you installed? Not sure I saw that info earlier.

From: zhoubinxyz [mailto:notifications@github.com] Sent: Tuesday, October 06, 2015 11:16 PM To: amd/OpenCL-caffe Cc: Golds, Jeff Subject: Re: [OpenCL-caffe] No GPU Device (#13)

@hughperkinshttps://github.com/hughperkins Hi, After stop lightdm, I have login on tty ps -ef | grep X shows that: xx 2446 2303 0 13:11 tty1 00:00:00 grep --color=auto X clinfo shows the same information as before,

But sudo cliff shows that: clinfo: error while loading shared libraries: libOpenCL.so.1: cannot open shared object file: No such file or directory

— Reply to this email directly or view it on GitHubhttps://github.com/amd/OpenCL-caffe/issues/13#issuecomment-146079272.

kuke commented 9 years ago

@zhoubinxyz, have you fixed all complaints in installing driver, and run "aticonfig --initial" after installing? If the answer is yes, then try "export DISPLAY:=0.0".

gujunli commented 9 years ago

@zhoubinxyz Just check on you, did you fix the problem?

ghost commented 9 years ago

Sorry I was busy these days, will update if I got news, thank you :)

hughperkins commented 8 years ago

Hey! Apparently the solution to this issue might be to use sudo su, see https://github.com/hughperkins/cltorch/issues/74#issuecomment-222333852

kkarnatak commented 7 years ago

Hi,

Is there any solution for this problem? I installed OpenCL on AMD ATI Radeon HD 7600M series and getting the same error.

~/OpenCL-caffe$ clinfo | grep CPU
  Device Type:                   CL_DEVICE_TYPE_CPU

~/OpenCL-caffe$ clinfo | grep GPU // No output However, glxinfo displays the AMD GPU.

~/OpenCL-caffe$ glxinfo  | grep OpenGL
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 7600M Series
OpenGL core profile version string: 4.3.13416 Core Profile Context 15.201.1151

ERROR:

~/OpenCL-caffe$ make runtest
[100%] Built target proto
[100%] Built target caffe
[100%] Built target gtest
[100%] Built target test.testbin
Current device id: 0
F0304 18:49:09.351958  1995 device.cpp:75] Err: No GPU devices
*** Check failure stack trace: ***
    @     0x2b783ee9cdaa  (unknown)
    @     0x2b783ee9cce4  (unknown)
    @     0x2b783ee9c6e6  (unknown)
    @     0x2b783ee9f687  (unknown)
    @     0x2b783e8322a4  caffe::Device::Init()
    @           0x6f4468  main
    @     0x2b7840bb5f45  (unknown)
    @           0x6f7d02  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)
naibaf7 commented 7 years ago

Try this: https://github.com/BVLC/caffe/tree/opencl