amd / OpenCL-caffe

This is a Experimental version of OpenCL by AMD Research, we now recommend you to use The official BVLC Caffe OpenCL branch is over at Caffe branch now at https://github.com/BVLC/caffe/tree/opencl
Other
514 stars 152 forks source link

Cannot find any dGPU?I use A10-7850 #37

Open x-xiaojian opened 8 years ago

x-xiaojian commented 8 years ago

Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!F0319 00:29:26.820322 5520 device.cpp:95] Cannot find any dGPU! * Check failure stack trace: * @ 0x7fbf1b7d0ea4 (unknown) @ 0x7fbf1b7d0deb (unknown) @ 0x7fbf1b7d07bf (unknown) @ 0x7fbf1b7d3a35 (unknown) @ 0x7fbf1bad9a77 caffe::Device::Init() @ 0x7fbf1badaedd caffe::Caffe::Caffe() @ 0x409865 train() @ 0x4069d1 main @ 0x7fbf1aab0a40 (unknown) @ 0x407019 _start @ (nil) (unknown) Aborted (core dumped)

gujunli commented 8 years ago

Do you have a GPU?

Sent from my iPhone

On Mar 18, 2016, at 9:36 AM, x-xiaojian notifications@github.com wrote:

Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!Could not create logging file: No such file or directory COULD NOT CREATE A LOGGINGFILE 20160319-002926.5520!F0319 00:29:26.820322 5520 device.cpp:95] Cannot find any dGPU! * Check failure stack trace: * @ 0x7fbf1b7d0ea4 (unknown) @ 0x7fbf1b7d0deb (unknown) @ 0x7fbf1b7d07bf (unknown) @ 0x7fbf1b7d3a35 (unknown) @ 0x7fbf1bad9a77 caffe::Device::Init() @ 0x7fbf1badaedd caffe::Caffe::Caffe() @ 0x409865 train() @ 0x4069d1 main @ 0x7fbf1aab0a40 (unknown) @ 0x407019 _start @ (nil) (unknown) Aborted (core dumped)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub

x-xiaojian commented 8 years ago

only a APU, this is my "clinfo" print Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.0 AMD-APP (1729.3) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing Number of devices: 2 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: AMD Radeon(TM) R7 Graphics
Device Topology: PCI[ B#0, D#1, F#0 ] Max compute units: 8 Max work items dimensions: 3 Max work items[0]: 256 Max work items[1]: 256 Max work items[2]: 256 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 720Mhz Address bits: 64 Max memory allocation: 419168256 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 1024 Alignment (bits) of base address: 2048 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: No Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 1676673024 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 419168256 Max global variable size: 377251328 Max global variable preferred total size: 1676673024 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 524288 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities:
Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: Yes Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities:
Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties:
Out-of-Order: No Profiling : Yes Queue on Device properties:
Out-of-Order: Yes Profiling : Yes Platform ID: 0x7f4f7f5058f0 Name: Spectre Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 1729.3 (VM) Profile: FULL_PROFILE Version: OpenCL 2.0 AMD-APP (1729.3) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images

Device Type: CL_DEVICE_TYPE_CPU Vendor ID: 1002h Board name:
Max compute units: 4 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 1024 Preferred vector width char: 16 Preferred vector width short: 8 Preferred vector width int: 4 Preferred vector width long: 2 Preferred vector width float: 8 Preferred vector width double: 4 Native vector width char: 16 Native vector width short: 8 Native vector width int: 4 Native vector width long: 2 Native vector width float: 8 Native vector width double: 4 Max clock frequency: 3000Mhz Address bits: 64 Max memory allocation: 3909534720 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 64 Max image 2D width: 8192 Max image 2D height: 8192 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 16 Max size of kernel argument: 4096 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 15638138880 Constant buffer size: 65536 Max number of constant args: 8 Local memory type: Global Local memory size: 32768 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 3909534720 Max global variable size: 1879048192 Max global variable preferred total size: 1879048192 Max read/write image args: 64 Max on device events: 0 Queue on device max size: 0 Max on device queues: 0 Queue on device preferred size: 0 SVM capabilities:
Coarse grain buffer: No Fine grain buffer: No Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 1 Error correction support: 0 Unified memory for Host and Device: 1 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities:
Execute OpenCL kernels: Yes Execute native function: Yes Queue on Host properties:
Out-of-Order: No Profiling : Yes Queue on Device properties:
Out-of-Order: No Profiling : No Platform ID: 0x7f4f7f5058f0 Name: AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G Vendor: AuthenticAMD Device OpenCL C version: OpenCL C 1.2 Driver version: 1729.3 (sse2,avx,fma4) Profile: FULL_PROFILE Version: OpenCL 1.2 AMD-APP (1729.3) Extensions: cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event

mpekalski commented 8 years ago

Did you create a log folder in Caffe directory? It should be on the same level like folders data and examples.

x-xiaojian commented 8 years ago

F0321 18:29:37.102532 9402 device.cpp:95] Cannot find any dGPU! * Check failure stack trace: * @ 0x7efc76358ea4 (unknown) @ 0x7efc76358deb (unknown) @ 0x7efc763587bf (unknown) @ 0x7efc7635ba35 (unknown) @ 0x7efc76661a77 caffe::Device::Init() @ 0x7efc76662edd caffe::Caffe::Caffe() @ 0x409865 train() @ 0x4069d1 main @ 0x7efc75638a40 (unknown) @ 0x407019 _start @ (nil) (unknown) Aborted (core dumped)

I have created a log folder in Caffe directory.but there also is a erro

tseckin commented 8 years ago

Also I have similar problems building. I successfully run cmake and make commands. Apparently I have Radeon 7850 but make runtest command gives me

Cannot find any dGPU!

error:

...............................
[ 98%] Building CXX object src/caffe/test/CMakeFiles/test.testbin.dir/test_random_number_generator.cpp.o
[100%] Linking CXX executable ../../../test/test.testbin
[100%] Built target test.testbin
Scanning dependencies of target runtest
Current device id: 0
F0322 11:27:36.232533  9290 device.cpp:95] Cannot find any dGPU! 
*** Check failure stack trace: ***
    @     0x7f6b9255fddd  (unknown)
    @     0x7f6b92561cc0  (unknown)
    @     0x7f6b9255f9ac  (unknown)
    @     0x7f6b925626be  (unknown)
    @     0x7f6b92ec70ab  caffe::Device::Init()
    @           0x6ecdb2  main
    @     0x7f6b8ce44700  __libc_start_main
    @           0x6f2139  _start
/bin/sh: line 1:  9290 Aborted                 (core dumped) /home/user/OpenCL-caffe-stable/build/test/test.testbin --gtest_shuffle --gtest_filter="-*GPU*"
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed

My clinfo command:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 10.6.9
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD PITCAIRN
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 10.6.9
  Driver Version                                  10.6.9
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               16
  Max clock frequency                             860MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              In file included from <built-in>:296:
In file included from <command line>:2:
In file included from /usr/include/clc/clc.h:15:
/usr/include/clc/clctypes.h:3:10: fatal error: 'stddef.h' file not found

  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               0 / 0        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              1073741824 (1024MiB)
  Error Correction support                        No
  Max memory allocation                           268435456 (256MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        268435456 (256MiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Clover
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [MESA]
  clCreateContext(NULL, ...) [default]            Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD PITCAIRN
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Clover
    Device Name                                   AMD PITCAIRN

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.3
  ICD loader Profile                              OpenCL 1.2
janchk commented 7 years ago

Lol. Same problem.

My clinfo

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD OLAND (DRM 2.43.0 / 4.4.0-47-generic, LLVM 3.9.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa)
  Driver Version                                  13.1.0-devel
  Device OpenCL C Version                         OpenCL C 1.1 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               6
  Max clock frequency                             825MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
=== CL_PROGRAM_BUILD_LOG ===
<unknown>:0:0: in function sum void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*): unsupported call to function get_local_size
  Preferred work group size multiple              <unknown>:0:0: in function sum void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*): unsupported call to function get_local_size

  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 2 / 2       
    half                                                 0 / 0        (n/a)
    float                                                4 / 4       
    double                                               2 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           1503238553 (1.4GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        None
  Image support                                   No
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        1503238553 (1.4GiB)
  Max number of constant args                     16
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Profiling timer resolution                      0ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
  Device Available                                Yes
  Compiler Available                              Yes
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [MESA]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

My error log

F1122 23:19:05.806424 21980 device.cpp:95] Cannot find any dGPU! 
*** Check failure stack trace: ***
    @     0x7f4c8ee2a5cd  google::LogMessage::Fail()
    @     0x7f4c8ee2c433  google::LogMessage::SendToLog()
    @     0x7f4c8ee2a15b  google::LogMessage::Flush()
    @     0x7f4c8ee2ce1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f4c8f4c4c5b  caffe::Device::Init()
    @           0x6f04d2  main
    @     0x7f4c8cf1c830  __libc_start_main
    @           0x6f3f59  _start
    @              (nil)  (unknown)
Aborted (core dumped)
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed
make: *** [runtest] Error 2
gujunli commented 7 years ago

It seems that you have a really old GPU with OpenCL 1.1. I suspect the error is caused by the incompatibility Thanks Junli

Sent from my iPhone

On Nov 22, 2016, at 12:29 PM, janchk notifications@github.com wrote:

Lol. Same problem.

My clinfo

`Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA

Platform Name Clover Number of devices 1 Device Name AMD OLAND (DRM 2.43.0 / 4.4.0-47-generic, LLVM 3.9.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa) Driver Version 13.1.0-devel Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 6 Max clock frequency 825MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 === CL_PROGRAM_BUILD_LOG === :0:0: in function sum void (float addrspace(1), float addrspace(1), float addrspace(1)): unsupported call to function get_local_size Preferred work group size multiple :0:0: in function sum void (float addrspace(1), float addrspace(1), float addrspace(1)): unsupported call to function get_local_size

Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 0 / 0 (n/a) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2147483648 (2GiB) Error Correction support No Max memory allocation 1503238553 (1.4GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max constant buffer size 1503238553 (1.4GiB) Max number of constant args 16 Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform `

My error log

`F1122 23:19:05.806424 21980 device.cpp:95] Cannot find any dGPU! * Check failure stack trace: @ 0x7f4c8ee2a5cd google::LogMessage::Fail() @ 0x7f4c8ee2c433 google::LogMessage::SendToLog() @ 0x7f4c8ee2a15b google::LogMessage::Flush() @ 0x7f4c8ee2ce1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f4c8f4c4c5b caffe::Device::Init() @ 0x6f04d2 main @ 0x7f4c8cf1c830 __libc_start_main @ 0x6f3f59 _start @ (nil) (unknown) Aborted (core dumped) src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed make[3]: \ [src/caffe/test/CMakeFiles/runtest] Error 134 CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed make[2]: * [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2 CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed make[1]: * [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2 Makefile:240: recipe for target 'runtest' failed make: * [runtest] Error 2

`

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

gujunli commented 7 years ago

Oh the log said that null platform pointer. Opencl caffe did not find your platform. Suggest you go to arc/caffe/device.cpp, device::init() look at the logic, print out more info

Sent from my iPhone

On Nov 22, 2016, at 12:45 PM, gujunli gujunli@gmail.com wrote:

It seems that you have a really old GPU with OpenCL 1.1. I suspect the error is caused by the incompatibility Thanks Junli

Sent from my iPhone

On Nov 22, 2016, at 12:29 PM, janchk notifications@github.com wrote:

Lol. Same problem.

My clinfo

`Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA

Platform Name Clover Number of devices 1 Device Name AMD OLAND (DRM 2.43.0 / 4.4.0-47-generic, LLVM 3.9.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 Mesa 13.1.0-devel (git-151aeca 2016-11-13 xenial-oibaf-ppa) Driver Version 13.1.0-devel Device OpenCL C Version OpenCL C 1.1 Device Type GPU Device Profile FULL_PROFILE Max compute units 6 Max clock frequency 825MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 === CL_PROGRAM_BUILD_LOG === :0:0: in function sum void (float addrspace(1), float addrspace(1), float addrspace(1)): unsupported call to function get_local_size Preferred work group size multiple :0:0: in function sum void (float addrspace(1), float addrspace(1), float addrspace(1)): unsupported call to function get_local_size

Preferred / native vector sizes char 16 / 16 short 8 / 8 int 4 / 4 long 2 / 2 half 0 / 0 (n/a) float 4 / 4 double 2 / 2 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No Address bits 64, Little-Endian Global memory size 2147483648 (2GiB) Error Correction support No Max memory allocation 1503238553 (1.4GiB) Unified memory for Host and Device Yes Minimum alignment for any data type 128 bytes Alignment of base address 1024 bits (128 bytes) Global Memory cache type None Image support No Local memory type Local Local memory size 32768 (32KiB) Max constant buffer size 1503238553 (1.4GiB) Max number of constant args 16 Max size of kernel argument 1024 Queue properties Out-of-order execution No Profiling Yes Profiling timer resolution 0ns Execution capabilities Run OpenCL kernels Yes Run native kernels No Device Available Yes Compiler Available Yes Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_fp64

NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] Success [MESA] clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform `

My error log

`F1122 23:19:05.806424 21980 device.cpp:95] Cannot find any dGPU! * Check failure stack trace: @ 0x7f4c8ee2a5cd google::LogMessage::Fail() @ 0x7f4c8ee2c433 google::LogMessage::SendToLog() @ 0x7f4c8ee2a15b google::LogMessage::Flush() @ 0x7f4c8ee2ce1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f4c8f4c4c5b caffe::Device::Init() @ 0x6f04d2 main @ 0x7f4c8cf1c830 __libc_start_main @ 0x6f3f59 _start @ (nil) (unknown) Aborted (core dumped) src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed make[3]: \ [src/caffe/test/CMakeFiles/runtest] Error 134 CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed make[2]: * [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2 CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed make[1]: * [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2 Makefile:240: recipe for target 'runtest' failed make: * [runtest] Error 2

`

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

janchk commented 7 years ago

gujiunli, Meaning this?

cl_int Device::Init(int deviceId) {

  DisplayPlatformInfo();

  clGetPlatformIDs(0, NULL, &numPlatforms);
  cl_platform_id PlatformIDs[numPlatforms];
  clGetPlatformIDs(numPlatforms, PlatformIDs, NULL);

  size_t nameLen;
  cl_int res = clGetPlatformInfo(PlatformIDs[0], CL_PLATFORM_NAME, 64,
      platformName, &nameLen);
  if (res != CL_SUCCESS) {
    fprintf(stderr, "Err: Failed to Get Platform Info\n");
    return 0;
  }
  platformName[nameLen] = 0;

  GetDeviceInfo();
  cl_uint uiNumDevices;
  cl_bool unified_memory = false;
  clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);
  uiNumDevices = numDevices;
  if (0 == uiNumDevices) {
    LOG(FATAL) << "Err: No GPU devices";
  } else {
    pDevices = (cl_device_id *) malloc(uiNumDevices * sizeof(cl_device_id));
    OCL_CHECK(
        clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, uiNumDevices,
            pDevices, &uiNumDevices));
    if (deviceId == -1) {
      int i;
      for (i = 0; i < (int) uiNumDevices; i++) {
        clGetDeviceInfo(pDevices[i], CL_DEVICE_HOST_UNIFIED_MEMORY,
            sizeof(cl_bool), &unified_memory, NULL);
        if (!unified_memory) { //skip iGPU
          //we pick the first dGPU we found
          pDevices[0] = pDevices[i];
          device_id = i;
          LOG(INFO) << "Picked default device type : dGPU " << device_id;
          break;
        }
      }
      if (i == uiNumDevices) {
        LOG(FATAL) << "Cannot find any dGPU! ";
      }
    } else if (deviceId >= 0 && deviceId < uiNumDevices) {
      pDevices[0] = pDevices[deviceId];
      device_id = deviceId;
      LOG(INFO) << "Picked device type : GPU " << device_id;
    } else {
      LOG(FATAL) << "  Invalid GPU deviceId! ";
    }
  }

  Context = clCreateContext(NULL, 1, pDevices, NULL, NULL, NULL);
  if (NULL == Context) {
    fprintf(stderr, "Err: Failed to Create Context\n");
    return 0;
  }
  CommandQueue = clCreateCommandQueue(Context, pDevices[0],
      CL_QUEUE_PROFILING_ENABLE, NULL);
  CommandQueue_helper = clCreateCommandQueue(Context, pDevices[0],
      CL_QUEUE_PROFILING_ENABLE, NULL);
  if (NULL == CommandQueue || NULL == CommandQueue_helper) {
    fprintf(stderr, "Err: Failed to Create Commandqueue\n");
    return 0;
  }
  BuildProgram (oclKernelPath);
  row = clblasRowMajor;
  col = clblasColumnMajor;
  return 0;
}

Probably it's can't go into

for (i = 0; i < (int) uiNumDevices; i++) {
        clGetDeviceInfo(pDevices[i], CL_DEVICE_HOST_UNIFIED_MEMORY,
            sizeof(cl_bool), &unified_memory, NULL);
        if (!unified_memory) { //skip iGPU
          //we pick the first dGPU we found
          pDevices[0] = pDevices[i];
          device_id = i;
          LOG(INFO) << "Picked default device type : dGPU " << device_id;
          break;
        }

Because uiNumDevices == 0. which goes from clGetDeviceIDs But I have no idea how to deal with this. BTW. I have ubuntu 16.04. and radeon HD 8750m. If that matter.

janchk commented 7 years ago

Or may be OCL_CHECK is failed for &uiNumDevices?

gujunli commented 7 years ago

you can insert a few printfs in this file to trace down whether the GPU are detected correctly. EG. first print out the num of devices

clGetDeviceIDs(PlatformIDs[0], CL_DEVICE_TYPE_GPU, 0, NULL, &numDevices);

uiNumDevices = numDevices;

printf("#device %d \n", uiNumDevices);

if (0 == uiNumDevices) {

LOG(FATAL) << "Err: No GPU devices";

} else { pDevices = (cl_device_id ) malloc(uiNumDevices \ sizeof(cl_device_id));

On Tue, Nov 22, 2016 at 1:56 PM, janchk notifications@github.com wrote:

Or may be OCL_CHECK is failed for &uiNumDevices?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/amd/OpenCL-caffe/issues/37#issuecomment-262377743, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFVvntdvxnfu9oCEh9UXX7619_Cdcvaks5rA2UUgaJpZM4H0BUC .

janchk commented 7 years ago

@gujunli I've found problem. Variable unified_memory == 1. Because of this it can't get into

if (!unified_memory) { //skip iGPU
          //we pick the first dGPU we found
          pDevices[0] = pDevices[i];
          device_id = i;
          LOG(INFO) << "Picked default device type : dGPU " << device_id;
          break;

//skip iGPU What is that?

gujunli commented 7 years ago

I see. So your GPU is an integrated GPU. The opencl caffe by default is set to look for a discrete GPU. You can comment out the unifiedmemotmry logic, use what ever GPU it finds. See whether it works. Junli

Sent from my iPhone

On Nov 23, 2016, at 10:05 AM, janchk notifications@github.com wrote:

@gujunli I've found problem. Variable unified_memory == 1. Because of this it's can't get into

if (!unified_memory) { //skip iGPU //we pick the first dGPU we found pDevices[0] = pDevices[i]; device_id = i; LOG(INFO) << "Picked default device type : dGPU " << device_id; break; //skip iGPU What does it do?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

janchk commented 7 years ago

@gujunli Here what i've got, after commenting if (!unified_memory)

Current device id: 0
#device_afterclget 1 
#device_afteroclcheck 1 
#device_beforecycle 1 
unified 1 
Err: Failed to build program
Note: Google Test filter = -*GPU*
Note: Randomizing tests' orders with a seed of 37283 .
[==========] Running 1297 tests from 201 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from SoftmaxLayerTest/2, where TypeParam = caffe::GPUDevice<float>
[ RUN      ] SoftmaxLayerTest/2.TestGradient
#device_afterclget 1 
#device_afteroclcheck 1 
#device_beforecycle 1 
unified 1 
Err: Failed to build program
*** Aborted at 1479990162 (unix time) try "date -d @1479990162" if you are using GNU date ***
PC: @     0x7f27137fe578 clCreateKernel
*** SIGSEGV (@0x110) received by PID 29204 (TID 0x7f271712bac0) from PID 272; stack trace: ***
    @     0x7f27168403e0 (unknown)
    @     0x7f27137fe578 clCreateKernel
    @     0x7f2716cb32a2 caffe::SyncedMemory::ocl_setup()
    @     0x7f2716caea4b caffe::Blob<>::Reshape()
    @     0x7f2716caec6f caffe::Blob<>::Reshape()
    @     0x7f2716caed0c caffe::Blob<>::Blob()
    @           0x78ed8c caffe::SoftmaxLayerTest<>::SoftmaxLayerTest()
    @           0x78efbb testing::internal::TestFactoryImpl<>::CreateTest()
    @           0xa7bfb3 testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0xa74c93 testing::TestInfo::Run()
    @           0xa74e25 testing::TestCase::Run()
    @           0xa769bf testing::internal::UnitTestImpl::RunAllTests()
    @           0xa76ce3 testing::UnitTest::Run()
    @           0x6f04df main
    @     0x7f27146fd830 __libc_start_main
    @           0x6f3f59 _start
    @                0x0 (unknown)
Segmentation fault (core dumped)
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 139
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed
make: *** [runtest] Error 2

Here what lspci | grep VGA does on my system

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mars [Radeon HD 8670A/8670M/8750M] (rev ff)

I want to use caffe with my AMD discrete graphics.

Also i tried DRI_PRIME=1 make runtest. Have no effect.

naibaf7 commented 7 years ago

Maybe try this, since the AMD branch is no longer actively maintained: https://github.com/bvlc/caffe/tree/opencl

janchk commented 7 years ago

@naibaf7 I tried this branch. Have no idea what to do.

F1124 19:42:10.999312 22298 syncedmem.cpp:201] Check failed: mapped_ptr == cpu_ptr_ (0x7fb1e00db000 vs. 0x208ab20) Device claims it support zero copy but failed to create correct user ptr buffer
*** Check failure stack trace: ***
    @     0x7fb1dee3c5cd  google::LogMessage::Fail()
    @     0x7fb1dee3e433  google::LogMessage::SendToLog()
    @     0x7fb1dee3c15b  google::LogMessage::Flush()
    @     0x7fb1dee3ee1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fb1dfc44bae  caffe::SyncedMemory::mutable_gpu_data()
    @           0xb93f06  caffe::RandomNumberGeneratorTest_TestRngUniformTimesUniformGPU_Test<>::TestBody_Impl()
    @           0xdedaf3  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @           0xde68aa  testing::Test::Run()
    @           0xde69f8  testing::TestInfo::Run()
    @           0xde6b05  testing::TestCase::Run()
    @           0xde858f  testing::internal::UnitTestImpl::RunAllTests()
    @           0xde88c3  testing::UnitTest::Run()
    @           0x8be529  main
    @     0x7fb1dcd23830  __libc_start_main
    @           0x8c4ce9  _start
    @              (nil)  (unknown)
Aborted (core dumped)
src/caffe/test/CMakeFiles/runtest.dir/build.make:57: recipe for target 'src/caffe/test/CMakeFiles/runtest' failed
make[3]: *** [src/caffe/test/CMakeFiles/runtest] Error 134
CMakeFiles/Makefile2:328: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/all' failed
make[2]: *** [src/caffe/test/CMakeFiles/runtest.dir/all] Error 2
CMakeFiles/Makefile2:335: recipe for target 'src/caffe/test/CMakeFiles/runtest.dir/rule' failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:240: recipe for target 'runtest' failed
make: *** [runtest] Error 2

I've got this. And lot of code above this. Can Oibaf's driver be reason for that?

naibaf7 commented 7 years ago

@janchk Yes if BOTH branches don't work it's definitely a failure with the drivers. Seems the driver reports unified (zero-copy) memory between your CPU and GPU, but this is not actually the case. So it's broken. It seems you use the CLOVER OpenCL implementation. This never worked for me and I usually disable it on my system. Use a FGLRX or AMDGPU-PRO implementation instead for better results.

janchk commented 7 years ago

@naibaf7 As I know FGLRX does not work on ubuntu 16.04. And at the same time AMDGPU-PRO does not support my graphics card (radeon hd 8750m). Should I use old version of ubuntu instead?

gujunli commented 7 years ago

Try ububtu 14.04, this is the most stable version for caffe

Sent from my iPhone

On Nov 24, 2016, at 11:58 AM, Fabian Tschopp notifications@github.com wrote:

@janchk Yes if BOTH branches don't work it's definitely a failure with the drivers. Seems the driver reports unified (zero-copy) memory between your CPU and GPU, but this is not actually the case. So it's broken. It seems you use the CLOVER OpenCL implementation. This never worked for me and I usually disable it on my system. Use a FGLRX or AMDGPU-PRO implementation instead for better results.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

naibaf7 commented 7 years ago

@janchk Unfortunately yes, Ubuntu 14.04 until AMDGPU-PRO driver becomes available for the Mars (HD8750M) chip. You might be able to make it work if you find a firmware binary (https://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu are the supported chipsets) & compile your own kernel with AMDGPU CIK support. Another option is an alternative OS like Fedora 25 with downgraded XORG (1.7) and kernel (4.4) and modified FGLRX, but also highly complicated. I can send you the modified FGLRX and instructions if you want.

Or use Windows, the Caffe OpenCL branch will be available under Windows at latest end of January.

janchk commented 7 years ago

@naibaf7 I would like to try method that you advised. If it is not difficult for you.

naibaf7 commented 7 years ago

Here is something that works on Fedora 23/24/25, instructions and all: https://github.com/imageguy/fglrx-for-Fedora You basically just need to install Fedora as you like, and downgrade with:

sudo dnf downgrade –-allowerasing –-releasever=21 xorg-x11-server-Xorg xorg-x11-server-common

Make sure to never update those packages or you'll get no GPU. Also download some 4.7.x kernel from here (final version, no RC kernel): http://koji.fedoraproject.org/koji/packageinfo?packageID=8 and install it before installing the AMD GPU driver.

This is how you install the driver in the ZIP below:

./ati-installer.sh 15.302 --install

Here is a driver I already patched that you can try: http://tingy.pw/fglrx.zip (won't be up forever...) As far as I remember that was patched for 4.4 to 4.7, but you gotta try and see what happens. It's OpenCL 2.0 and actually still faster than AMDGPU-PRO and CLOVER... ;).

janchk commented 7 years ago

@naibaf7 Here my installation progress. Probably I made some mistakes. 1)Installed fedora 25 2)make downgrade xorg 3)install kernel-4.7.9-200.fc24.x86_64.rpm (core, modules, main) 4)boot with this kernel 5) try to ./ati-installer.sh 15.302 --install, got error Please install the required pre-requisites before Find them for ubuntu but not for fedora. 6) try with --force and got another error P.S. may be I should switch thread to write to.

naibaf7 commented 7 years ago

@janchk What pre-requisites is it complaining about? What is the error with force? The steps you did seem correct so far!

janchk commented 7 years ago

@naibaf7 When I install without --force.

Supported adapter detected.
Check if system has the tools required for installation.
Uninstalling any previously installed drivers.
Unloading drm module...
rmmod: ERROR: Module drm is in use by: i915 drm_kms_helper
[Message] Kernel Module : Trying to install a precompiled kernel module.
[Message] Kernel Module : Precompiled kernel module version mismatched.
[Message] Kernel Module : Found kernel module build environment, generating kernel module now.
AMD kernel module generator version 2.1
Error:
kernel includes at /lib/modules/4.7.9-200.fc24.x86_64/build/include do not match current kernel.
they are versioned as ""
instead of "4.7.9-200.fc24.x86_64".
you might need to adjust your symlinks:
- /usr/include
- /usr/src/linux
[Error] Kernel Module : Failed to compile kernel module - please consult readme.
[Reboot] Kernel Module : dracut

I have solved problem with pre-requisites. So i got this. Same log i have with --force flag.

naibaf7 commented 7 years ago

Oooh, this already looks quite good. 3)install kernel-4.7.9-200.fc24.x86_64.rpm (core, modules, main) You might have missed to install the kernel-headers and/or kernel-devel package for this kernel. What does uname -r give you?

janchk commented 7 years ago

@naibaf7 Yup don't install them both. uname -r return 4.7.9-200.fc24.x86_64

janchk commented 7 years ago

@naibaf7 After system reinstallation and making other steps i've got this fglrx-install.log. And then GNOME crashed and forced me to reboot. Then it stuck on boot log.

naibaf7 commented 7 years ago

@janchk Hmm then the driver has not been patched up to 4.7 prior to installation, sorry. If you look for the chunks that fail during compilation, they should all be handled by the patches (.diff) from here: https://github.com/imageguy/fglrx-for-Fedora if you still have the nerves, you can uninstall fglrx from rescue or console mode (using the installer) and reinstall it after applying the above patches.

janchk commented 7 years ago

@naibaf7 When I trying to patch fglrx that you've sent.

[root@localhost fglrx-install.MtTczW]# patch -p1 </root/Downloads/fglrx-for-Fedora-master/fglrx_kernel_4.7.diff
patching file common/lib/modules/fglrx/build_mod/firegl_public.c
Hunk #1 succeeded at 615 (offset -16 lines).
Hunk #2 succeeded at 3200 (offset -20 lines).
Hunk #3 succeeded at 3218 (offset -20 lines).
Hunk #4 succeeded at 3229 (offset -20 lines).
Hunk #5 succeeded at 3405 (offset -20 lines).
Hunk #6 succeeded at 3415 (offset -20 lines).
Hunk #7 succeeded at 4502 with fuzz 2 (offset -20 lines).
Hunk #8 succeeded at 4525 with fuzz 2 (offset -15 lines).
Hunk #9 succeeded at 4560 with fuzz 2 (offset -11 lines).
Hunk #10 succeeded at 4582 with fuzz 2 (offset -6 lines).
Hunk #11 FAILED at 6475.
1 out of 11 hunks FAILED -- saving rejects to file common/lib/modules/fglrx/build_mod/firegl_public.c.rej
patching file common/lib/modules/fglrx/build_mod/firegl_public.h

And this is what I've got , when patching driver from here: https://github.com/imageguy/fglrx-for-Fedora

[root@localhost fglrx-install.f1TQFY]# patch -p1 </root/Downloads/fglrx-for-Fedora-master/fglrx_kernel_4.7.diff
patching file common/lib/modules/fglrx/build_mod/firegl_public.c
patching file common/lib/modules/fglrx/build_mod/firegl_public.h

Should I worry about Hunk #11 FAILED at 6475. ?

Update. Yes I should. Another boot failed and second log fglrx-install1.txt

naibaf7 commented 7 years ago

Hmm the only reason for it still failing after the latest patches I could only imagine that the kernel (XSTATE_FP, __fgl_cmpxchg, fpu_xsave(fpu)) changed too substantially between 4.7.2 (latest tested kernel) and 4.7.9. Seems taking the 4.7.2-FC24 kernel is the last thing you can try to do :( no one bothered to update the patches after that.

janchk commented 7 years ago

@naibaf7 Thank you for your support. I have solved the problem. Used ubuntu 15.10, latest AMD drivers from the official site and your branch of caffe using viennacl. Oh, so much performance vs cpu calculation. screenshot from 2016-12-03 16-58-12

naibaf7 commented 7 years ago

@janchk Cool! Are you happy with the performance? You can try to enable LibDNN compilation in the Makefile and switch from ViennaCL to clBLAS (from AMD) to get even more performance out of it :)

janchk commented 7 years ago

@naibaf7 MANY thanks to you dude! Using my old configuration I faced with error F1203 19:00:07.040632 5162 syncedmem.cpp:215] Check failed: 0 == err (0 vs. -61) OpenCL buffer allocation of size 2707024896 failed. when trying to handle big files. Following your advice I fixed it! And much more performance of course. P.s. I would be glad to test your Windows branch after it realise.

naibaf7 commented 7 years ago

@janchk OK, watch out for updates on my OpenCL branch, it will show when Windows is ready :) Glad I could help, enjoy Caffe :)

It is true that F1203 19:00:07.040632 5162 syncedmem.cpp:215] Check failed: 0 == err (0 vs. -61) OpenCL buffer allocation of size 2707024896 failed. is because you run out of GPU memory. But LibDNN uses much less memory for convolutions than the Caffe engine, thus you can load bigger models. It's similar to nVidia's cuDNN.

dasha-5555-5 commented 7 years ago

Hi! I have this problem when tried to do runtest on Ubuntu 16.04: make runtest [100%] Built target proto [100%] Built target caffe [100%] Built target gtest [100%] Linking CXX executable ../../../test/test.testbin [100%] Built target test.testbin Current device id: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0

device 1

X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 X server found. dri2 connection failed! DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument Assuming 131072kB available aperture size. May lead to reduced performance or incorrect rendering. get chip id failed: -1 [22] param: 4, val: 0 F0521 19:01:47.042328 4111 device.cpp:96] Cannot find any dGPU! Check failure stack trace: @ 0x7f88209955cd google::LogMessage::Fail() @ 0x7f8820997433 google::LogMessage::SendToLog() @ 0x7f882099515b google::LogMessage::Flush() @ 0x7f8820997e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7f8820fcdd3b caffe::Device::Init() @ 0x6f1742 main @ 0x7f881e6d2830 __libc_start_main @ 0x6f3f09 _start @ (nil) (unknown) Aborted (core dumped) Any help will be appreciating.