Open 1970633640 opened 2 years ago
How were you able to build caffe with OpenCL support, using linux?
How were you able to build caffe with OpenCL support, using linux?
In my tests, OpenCL Caffe is slower on NVIDIA GPU than CUDA version and it is too slow on Intel CPU, so I do not recommend it.
But if you want to use it any way:
Fitst, install libraries I copied from dockerfile:
sudo apt-get update
sudo apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
wget \
libatlas-base-dev \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
libhdf5-serial-dev \
libleveldb-dev \
liblmdb-dev \
libopencv-dev \
libprotobuf-dev \
libsnappy-dev \
protobuf-compiler \
python-dev \
python-numpy \
python-pip \
python-setuptools \
python-scipy
and
sudo apt-get install python3-pip
sudo apt install libsqlite3-dev
and install all python requirements.
download and extract zip file or
git clone https://github.com/BVLC/caffe.git
GPU or Intel OpenCL driver is required, too. You can install NVIDIA driver (driver should be enough, but CUDA may be better) or "Intel SDK for Opencl Applications" driver.
Finally, in the extracted code folder:
mkdir build
cd build
cmake .. -DUSE_OPENCL=1
make -j8 (8 is total cpu cores)
OpenCV version may cause compile errors. If it happes, replace "CV_LOAD_IMAGE_COLOR" and "CV_LOAD_IMAGE_GRAYSCALE" to "cv::IMREAD_COLOR" and "cv::IMREAD_GRAYSCALE" in the code (use VSCode, Jetbrains IDEs, etc to search codes in all source files) and compile again.
reference https://github.com/BVLC/caffe/issues/6680
After a successful compilation, I can use -gpu 0 to select NVIDIA GPU as OpenCL device and -gpu 1 to select Intel CPU as OpenCL device and -cpu to select Intel CPU as CPU device when training, etc.
I prefer Ubuntu 16, 18 LTS versions and I had successful builds on these versions, but this should work on other systems.
Question: Can two OpenCL devices from different platforms (example: 1x NVIDIA GPU + 1x AMD GPU) be used at the same time to accelerate training process in OpenCL Caffe?
During my test, after init, program will exit with error before the first iteration. (Already disabled host_unified_memory)
It is possible to speed up training process with multi OpenCL devices, right? Because this is the advantage of OpenCL? Are there any forks of OpenCL-Caffe or are there any instructions to achieve speed-up with multi OpenCL devices? Can some one please give some projects or documents or instructions to use multiple OpenCL devices at the same time?
I guess if two (or more) devices are initiated in ViennaCL and one queue is assigned to each device. During training process, OpenCl kernels can be separated and
clEnqueueNDRange
to these queues?