BVLC / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
34.08k stars 18.7k forks source link

ViennaCL: FATAL ERROR: Could not find kernel #5804

Open innnk opened 7 years ago

innnk commented 7 years ago

Error occur when "make runtest"

OpenCL error -11 on line 244 of /tmp/yaourt-tmp-innnk/aur-clblas/src/clBLAS-2.12/src/library/blas/xgemm.cc
clBuildProgram Failed
err = -11
Error: Failed to build program executable!

Build Log:

input.cl:104:53: warning: double precision constant requires cl_khr_fp64, casting to single precision
input.cl:111:5: warning: implicit declaration of function 'mem_fence' is invalid in C99
input.cl:37:3: note: expanded from macro 'MICRO_TILE'
<unknown>:0:0: in function sgemm_Col_NN_B0_MX016_NL016_KX01 void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*, float, float, i32, i32, i32, i32, i32, i32, i32, i32, i32): unsupported call to function mem_fence

OpenCL error -11 on line 244 of /tmp/yaourt-tmp-innnk/aur-clblas/src/clBLAS-2.12/src/library/blas/xgemm.cc
clBuildProgram Failed
err = -11
Error: Failed to build program executable!

Build Log:

input.cl:101:54: warning: double precision constant requires cl_khr_fp64, casting to single precision
input.cl:104:53: warning: double precision constant requires cl_khr_fp64, casting to single precision
input.cl:111:5: warning: implicit declaration of function 'mem_fence' is invalid in C99
input.cl:37:3: note: expanded from macro 'MICRO_TILE'
<unknown>:0:0: in function sgemm_Col_NN_B0_ML016_NL016_KX01 void (float addrspace(1)*, float addrspace(1)*, float addrspace(1)*, float, float, i32, i32, i32, i32, i32, i32, i32, i32, i32): unsupported call to function mem_fence

F0727 23:55:31.528924 17189 greentea_math_functions.cpp:225] Check failed: status == clblasSuccess (-48 vs. 0) GREENTEA ERROR: clBLAS error
*** Check failure stack trace: ***
    @     0x7f440271287d  google::LogMessage::Fail()
    @     0x7f44027146e3  google::LogMessage::SendToLog()
    @     0x7f44027123d8  google::LogMessage::Flush()
    @     0x7f4402715149  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f43faea7aaf  caffe::greentea_gpu_gemm<>()
    @     0x7f43faf4c81d  caffe::MVNLayer<>::Forward_gpu()
    @     0x5630c8579622  (unknown)
    @     0x5630c85cefd2  (unknown)
    @     0x5630c8a134da  (unknown)
    @     0x5630c8a0c91a  (unknown)
    @     0x5630c8a0c9fc  (unknown)
    @     0x5630c8a0cb35  (unknown)
    @     0x5630c8a0cff0  (unknown)
    @     0x5630c8a0d137  (unknown)
    @     0x5630c854e2f9  (unknown)
    @     0x7f43f9e354ca  __libc_start_main
    @     0x5630c85593ca  (unknown)
make: *** [Makefile:673: runtest] 已放弃 (core dumped)

This is my "Makefile.config"

# Enable the CUDA backend
# USE_CUDA := 1

USE_LIBDNN := 1

# Folder of the ViennaCL header-only library
VIENNACL_DIR = /usr/include/

USE_CLBLAS := 1
# Custom clBLAS lib and include directories.
CLBLAS_INCLUDE := /usr/include/clBLAS/
CLBLAS_LIB     := /usr/lib/

# open for OpenBlas
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
BLAS_INCLUDE := /usr/include/
BLAS_LIB := /usr/lib/

During "make runtest",also have many error like

ViennaCL: FATAL ERROR: Could not find kernel 'threshold_double' from program 'kernel_program'
Number of kernels in program: 0
unknown file: Failure

Moreover,when I run clinfo,it shows that 2 devices available,GPU included.

Any suggestion will be appreciated. Thanks a lot~~

Your system configuration

Operating system:Manjaro,CPU: AMD Phenom(tm) II X4 B70 Processor GPU:AMD HD 7850 Compiler: CUDA version (if applicable): CUDNN version (if applicable): BLAS:ViennaCL(clBLAS) Python or MATLAB version (for pycaffe and matcaffe respectively):

znmeb commented 7 years ago

This is probably upstream - I have an AMD GPU and I'm running Arch Linux (what Manjaro is based on) and I've seen similar issues. Pretty much everything I've tried that uses OpenCL on my system crashes in some way or another. Even low level tests like clpeak crash.

I traced a few bugs back to the Mesa OpenCL drivers' Bugzilla but there aren't people working on them. What you may have to do is:

  1. Install Ubuntu 16.04 LTS. This (and CentOS / RHEL 7) are the only Linux distros AMD supports.
  2. Download and install the Ubuntu SDK from AMD's support site.
  3. Join their community forums.

It still may not work with Ubuntu and the AMD-supported drivers; I had similar problems when I ran Ubuntu on my hardware. But at least you'll be able to get some support from the AMD community.

I've been trying to get OpenCL to work on my AMD GPU on and off for months, first on Fedora, then Ubuntu 16.04 LTS and now on Arch. I've essentially given up and am waiting for GPU community projects like caffe to migrate to Vulkan, which appears to be better supported. You're still likely to need to switch to Ubuntu 16.04; there's not much testing happening with other distros.

One other note: a few days ago I downloaded the ViennaCL manual, and it calls out a few bugs in AMD OpenCL implementations.

znmeb commented 7 years ago

Here's the tracking bug at Freedesktop.org for OpenCL bugs on Clover (Mesa on AMD / Radeon):

https://bugs.freedesktop.org/show_bug.cgi?id=99553

innnk commented 7 years ago

@znmeb Thank you for your suggestion! When I run clinfo,it shows that 2 devices available,GPU included. Does it show opencl works with my GPU?

znmeb commented 7 years ago

Try clpeak - it's in the AUR (aur/clpeak-git on Arch but you should be able to get it from Manjaro). On my system it crashes.

znmeb commented 7 years ago

By the way - I tried building caffe from source. There are three options for the BLAS: ViennaCL, CLBLASt and clBLAS. They all crashed IIRC but I can run them again if this is actually a caffe bug and not something in opencl-mesa.

I may try dual-booting the machine with Ubuntu 16.04 LTS and running the AMD proprietary suite. They have a version of caffe that doesn't use OpenCL, I think, and if something's broken there I might be able to get it fixed.

naibaf7 commented 7 years ago

OpenCL-Mesa is not supported by OpenCL Caffe - you need to use FGLRX or AMDGPU-PRO (most stable together with kernel 4.12) supplied OpenCL, or Intel OpenCL (beignet or proprietary) or nVidia OpenCL.

znmeb commented 7 years ago

AMDGPU-PRO is only supported on Ubuntu 16.04 LTS. When I was running Ubuntu 16.04 LTS, I tried numerous benchmarks with it and they all failed. I gave up on it. If there's a public bug tracker with responsive engineers, I'd be willing to dual-boot my machine and try again. Otherwise, it's a non-starter.

naibaf7 commented 7 years ago

AMDGPU-PRO also works on Fedora 25/26 for me. Only the OpenCL (compute) parts of the driver are needed. A recent kernel (4.10, 4.11, 4.12) with the open source kernel module of AMDGPU plus Hawaii, Polaris or Vega firmware is sufficient. Please ask if you need more detailed instructions to install the necessary components. OpenCL Caffe currently validates correctly on W9100, R9 390X, RX 480, RX 470, RX 460, GTX 1080, GTX 980, Intel i7-6560U, GT 650M and possibly a lot more, if the correct drivers are installed. It also works regardless of which BLAS library is used. Alternatively, OpenCL Caffe is also supported for AMD and nVidia GPUs on Windows 7/8/10 (Intel GPU support for Windows currently under development). You just cannot use the Clover OpenCL implementations. The only opensource OpenCL implementation stable enough to compile and run Caffe is Intel's Beignet. Otherwise, proprietary components are required.

znmeb commented 7 years ago

I didn't know AMDGPU-Pro was supported on Fedora - I switched from Fedora 25 to Ubuntu 16.04 LTS because I couldn't find support on Fedora. I'd much prefer Fedora to Ubuntu! And if it works on Fedora it can probably be made to work on Arch / Manjaro. You might have to freeze some kernel / library versions to older releases though.

naibaf7 commented 7 years ago

Yeah it does (I use it myself), it's actually quite easy. As an easy help to know what steps from below you have to do according to your GPU: Pitcairn = HD7870, R7 265 Tonga = R9 285/380/380X Tahiti = HD7950/HD7970, R9 280/280X Hawaii = R9 290/290X/390/390X Fiji = Fury X/Nano Polaris = RX 460/470/480/560/570/580 Vega = Vega FE, RX Vega (incomplete list, check here: https://en.wikipedia.org/wiki/Graphics_Core_Next)

What you have to do is:

DO NOT install any other AMDGPU-PRO packages than listed above, because it will not be compatible with the XORG version of Fedora 25/26. But those other components are also not needed for OpenCL to work. Unfortunately, if you use a kernel older than 4.11/4.12 it may be somewhat unstable, but with Fedora 26 at least it works flawlessly (and it works OK on Fedora 25). Using this install method, it is also possible to run Intel, AMD and nVidia OpenCL drivers in parallel without breaking whatever you prefer to run your XORG/Wayland instance with.

I just saw AMD has version 17.30 out now (http://support.amd.com/en-us/kb-articles/Pages/AMDGPU-PRO-Driver-for-Linux-Release-Notes.aspx) which I haven't tested yet, but 17.10 worked and I suppose those do as well, just use the different version numbers of the same RPMs listed above.

NOTE that there are similar instructions for Arch Linux: https://wiki.archlinux.org/index.php/AMDGPU#Enable_Southern_Islands_.28SI.29_and_Sea_Islands_.28CIK.29_support They also have a way to install proprietary OpenCL for AMDGPUs without installing the whole AMDGPU-PRO package.

So for now actually, the driver situation is ideal for computing, as you can get OpenCL to work on almost all distributions of Linux with any AMD GCN GPU.

znmeb commented 7 years ago

I'll try the Arch first ... It would break a bunch of other stuff to go back to Fedora.

naibaf7 commented 7 years ago

May I ask what GPU you are using?

znmeb commented 7 years ago

VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Bonaire XT [Radeon HD 7790/8770 / R7 360 / R9 260/360 OEM]

IIRC it's in the "Sea Islands" family. I've got clpeak functioning on Arch just by installing amdgpu-pro-opencl; I'd already done the kernel stuff a couple of months ago.

znmeb commented 7 years ago

Success:

[----------] Global test environment tear-down
[==========] 2078 tests from 276 test cases ran. (2614277 ms total)
[  PASSED  ] 2078 tests.

Notes:

  1. The key hacks are a. Make sure the kernel amdgpu module is in the initramfs. b. Install amdgpu-pro-opencl only. The full amdgpu-pro package has some dependency issues and won't install. Now that I have it working I can start filing bugs in the Arch User Repository.

  2. Arch has caffe and caffe-cpu in its repositories. caffe doesn't install but caffe-cpu does. I think their build defaults to CUDA/CUDANN and wouldn't use the AMD GPU even with the right drivers. I'll post something on the repository comments after I get all the stuff documented.

For @innnk: This should work on Manjaro if you can access the Arch User Repository and do the initramfs setup described in https://wiki.archlinux.org/index.php/AMDGPU#Enable_Southern_Islands_.28SI.29_and_Sea_Islands_.28CIK.29_support

Now I have to go learn all the math. ;-)

naibaf7 commented 7 years ago

Congratulations, looks good :)

innnk commented 7 years ago

@znmeb I am very happy to hear you succeded.Archlinux and amdgpu are few used in GPGPU,It's takes me much time to search similar cases,but come to nothing.I will try again with your suggestion. @naibaf7 Thanks so much for your patient guidance,I got what to do next from your replies,but I've just started to learn manjaro(Archlinux),I still need your some help,here is the question: 1.When I search for install instruments,there may be for ubuntu,centos,RHL system,archlinux excluded.Since I am studying on manjaro(Archlinux),which type of instruments should I refer to? 2.I've installed xf86-video-amdgpu,which recommended by archlinux wiki.Is it neccessary to uninstall it before installe amdgpu-pro?

  1. I can't find detailed steps to make sure if kernel config CONFIG_DRM_AMDGPU_CIK=y from https://wiki.archlinux.org/index.php/AMDGPU#Enable_Southern_Islands_.28SI.29_and_Sea_Islands_.28CIK.29_support.What should I do? 4.I used sudo dnf install -- RPMS/x86_64/libopencl-amdgpu-pro-17.30-458935.el7.x86_64.rpm,it shows:
    Unable to detect release version (use '--releasever' to specify release version)
    错误:
    问题: conflicting requests
    - nothing provides amdgpu-pro-core needed by libopencl-amdgpu-pro-17.30-458935.el7.x86_64

    I used rpm -i RPMS/x86_64/libopencl-amdgpu-pro-17.30-458935.el7.x86_64.rpm,but errors occured: libopencl-amdgpu-pro-17.30-458935.el7.x86_64.rpmneed /bin/sh amdgpu-pro-core ids-amdgpu-proetc.

I need install something before installing amdgpu-pro?or just install them in right order?or some commands of dnfcan install them together?

I've just learned linux,could you tell me what to do step by step?Thanks a lot :).

naibaf7 commented 7 years ago

If you are on manjaro/arch, please follow these steps, and do NOT install the RPM files for Fedora/RHEL: https://wiki.archlinux.org/index.php/Kernels/Arch_Build_System This should show you how to build a custom kernel. In that kernel, you have to enable CIK for your AMD GPU in this part: make nconfig (PKGBUILD) The rest is described in the tutorial there. Good luck. It's a shame AMDGPU is not yet selected as default driver over RADEON (which is useless for computing) on CIK/SI hardware.