clMathLibraries / clBLAS

a software library containing BLAS functions written in OpenCL
Apache License 2.0
839 stars 240 forks source link

test-functional fails xgemm.cc #207

Closed mpekalski closed 8 years ago

mpekalski commented 8 years ago

I compiled clBLAS cloned from the git (c2.8) with no bigger problems, but any kind of tests fail, for example

$ /opt/clBLAS/bin/test-functional 

Initialize OpenCL and clblas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
[==========] Running 715 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 203 tests from ERROR
[ RUN      ] ERROR.InvalidCommandQueue
OpenCL error -36 on line 350 of /home/marcin/Downloads/clBLAS/src/library/blas/xgemm.cc
test-functional: /home/marcin/Downloads/clBLAS/src/library/blas/xgemm.cc:350: clblasStatus clblasGemm(clblasOrder, clblasTranspose, clblasTranspose, size_t, size_t, size_t, Precision, cl_mem, size_t, size_t, cl_mem, size_t, size_t, Precision, cl_mem, size_t, size_t, cl_uint, _cl_command_queue**, cl_uint, _cl_event* const*, _cl_event**) [with Precision = float; clblasStatus = clblasStatus_; clblasOrder = clblasOrder_; clblasTranspose = clblasTranspose_; size_t = long unsigned int; cl_mem = _cl_mem*; cl_uint = unsigned int; cl_command_queue = _cl_command_queue*; cl_event = _cl_event*]: Assertion `false' failed.
Aborted (core dumped)

I run it on lubuntu (upgraded straight after installation to kernel 4.2.0-22) , with Radeon R9 290. First I installed divers from AMD's website (15.12), then AMDSDK-3.0, and then ACML 5.3.1 with an update 6.1.0.31. At the moment clRNG and clFFT compile and run without problems. I also tried compiling with gcc/g++ 4.7 and 4.8, or using fglrx-update, but with the same result.

I used gcc 5.2.1 g++ 5.2.1

uname -am

Linux thesun 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

clinfo

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

dpkg -l fglrx fglrx-core fglrx-dev fglrx-amdcccle

||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  fglrx          2:15.302-0ub amd64        Video driver for the AMD graphics
ii  fglrx-amdcccle 2:15.302-0ub amd64        Catalyst Control Center for the A
ii  fglrx-core     2:15.302-0ub amd64        Minimal video driver for the AMD 

fglrxinfo

display: :0  screen: 0
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon R9 200 Series
OpenGL version string: 4.5.13416 Compatibility Profile Context 15.302

cmake - mCMakeCache.txt

cmake ../src -DOPENCL_VERSION:STRING=2.0 -DACML_INCLUDE_DIRS:PATH=/opt/acml5.3.1/gfortran64_mp/include -DACML_LIBRARIES:FILEPATH=/opt/acml5.3.1/gfortran64_mp/lib/libacml_mp.so -DBLAS_DEBUG_TOOLS=ON -DOPENCL_OFFLINE_BUILD_HAWAII_KERNEL=ON -DBUILD_PERFORMANCE=ON -DCMAKE_INSTALL_PREFIX=/opt/clBLAS -DBUILD_SHARED_LIBS=ON -DUSE_SYSTEM_GTEST=ON -DOPENCL_LIBRARIES=/usr/lib/libOpenCL.so.1

UPDATE 1: I tried downgrading the driver to from 1912.5 to 1800.8, but it did not help (clinfo itself crashed)

clinfo

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1800.8)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

UPDATE 2: The error is caused by a code around line 100 in src/tests/functional/func-error.cpp

TEST(ERROR, InvalidEventWaitList) {
    ErrorClass<GemmMetod<float> > ec;
    ec.error(CL_INVALID_EVENT_WAIT_LIST);
}

I tried to put some other tests in front of this one and their did pass.

mpekalski commented 8 years ago

I found a solution, but it is kind of strange or at least unexpected.

This does not work

/opt/clBLAS/bin/test-functional

but this does

sudo /opt/clBLAS/bin/test-functional

and in the end

[----------] Global test environment tear-down
[==========] 715 tests from 5 test cases ran. (220040 ms total)
[  PASSED  ] 715 tests.

any ideas why I need sudo to run the tests?

pavanky commented 8 years ago

Are you running this over ssh?

mpekalski commented 8 years ago

No, everything is run locally.

All required libraries that I installed manually I installed in /opt, and those are gflags, glog, gtest, acml, amdappsdk, boost, and clblas, but I do not thing it has anything to do with tests requiring root.

mpekalski commented 8 years ago

The /var/log/apport.log had those kind of entries when the test crashed

240 ERROR: apport (pid 1055) Sun Jan  3 01:14:00 2016: debug: session gdbus call:
241 ERROR: apport (pid 1055) Sun Jan  3 01:14:01 2016: this executable already crashed 2 times, ignoring
242 ERROR: apport (pid 28138) Sun Jan  3 01:25:35 2016: called for pid 27038, signal 6, core limit 0
243 ERROR: apport (pid 28138) Sun Jan  3 01:25:35 2016: executable: /opt/clBLAS/bin/test-functional (command line "/opt/clBLAS/bin/test-functional")
244 ERROR: apport (pid 28138) Sun Jan  3 01:25:35 2016: gdbus call error: Error: GDBus.Error:org.freedesktop.DBus.Error.ServiceUnknown: The name org.gnome.SessionManager        was not provided by any .service files
mpekalski commented 8 years ago

Another interesting thing is that when running the same test as root and as a regular user the output is slightly different. As a root you can see a line Invalid Size for A, which does not appear otherwise.

/opt/clBLAS/bin/test-functional --gtest_filter=InvalidMemObjecttrmv

Initialize OpenCL and clblas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
Note: Google Test filter = *InvalidMemObjecttrmv*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ERROR
[ RUN      ] ERROR.InvalidMemObjecttrmv
[       OK ] ERROR.InvalidMemObjecttrmv (36 ms)
[----------] 1 test from ERROR (36 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (37 ms total)
[  PASSED  ] 1 test.

sudo /opt/clBLAS/bin/test-functional --gtest_filter=InvalidMemObjecttrmv

Initialize OpenCL and clblas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
Note: Google Test filter = *InvalidMemObjecttrmv*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ERROR
[ RUN      ] ERROR.InvalidMemObjecttrmv
Invalid Size for A
[       OK ] ERROR.InvalidMemObjecttrmv (35 ms)
[----------] 1 test from ERROR (35 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (35 ms total)
[  PASSED  ] 1 test.
TimmyLiu commented 8 years ago

is it true that "clinfo" crashes while "sudo clinfo" works?

mpekalski commented 8 years ago

I experienced the crash of clinfo before I renamed libamdocl12cl64.so coming from AMDAPPSDK-3.0 so it would not be picked up by clinfo. After renaming it worked fine.

At the moment I have it renamed and clinfo works for both root and non-root accounts.

Now, after thinking that I solved a problem with test-functional and played a bit with test-correctness compiling kernels etc. and the test-functional does not work anymore, and I have no idea why. I am back to square one. But it might be interesting that the test I mentioned above that was reporting "Invalid Size for A" does not print that line anymore for sudo, but as I said the overall test-functional does not go through.

In general which libraries (libOpenCL.so, libamdocl*64.so) should I link to? The ones coming from a driver or SDK?

Also does it make any difference if I use header files for OpenCL coming from khronos.org or from the SDK?

mpekalski commented 8 years ago

So I set up a new system, and below are all the commands I used from fresh installation to run of test-functional. I do not know if it helps, but maybe somebody will spot a place where I did something wrong. If I got some specific output I put a command in bold and then output in the code block.

sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get dist-upgrade -y
reboot

**uname -a***

Linux thesun 4.2.0-22-generic #27-Ubuntu SMP Thu Dec 17 22:57:08 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
sudo apt-get install  libgl1-mesa-dev freeglut3 freeglut3-dev  binutils libglm-dev mesa-utils build-essential cdbs dh-make  fakeroot libqtgui4 -y vim cmake gcc g++ checkinstall gfortran aptitude htop alien vim dh-modaliases execstack dkms lib32gcc1 git  -y

cd $HOME/Downloads
unzip radeon-crimson-15.12-15.302-151217a-297685e.zip
cd fglrx-15.302/
sudo ./amd-driver-installer-15.302-x86.x86_64.run

After the installation of the driver I got a msg

** (zenity:27418): WARNING **: Error retrieving accessibility bus address: org.freedesktop.DBus.Error.ServiceUnknown: The name org.a11y.Bus was not provided by any .service files
Gtk-Message: GtkDialog mapped without a transient parent. This is discouraged.

_dpkg -l fglrx fglrx-core fglrx-dev fglrx-amdcccle_

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                 Architecture            Description
+++-=====================================-=======================-=======================-================================================================================
ii  fglrx                                 2:15.302-0ubuntu1       amd64                   Video driver for the AMD graphics accelerators
ii  fglrx-amdcccle                        2:15.302-0ubuntu1       amd64                   Catalyst Control Center for the AMD graphics accelerators
ii  fglrx-core                            2:15.302-0ubuntu1       amd64                   Minimal video driver for the AMD graphics accelerators
dpkg-query: no packages found matching fglrx-dev
sudo dpkg -i fglrx-dev_15.302-0ubuntu1_amd64.deb

Add to ~/.bashrc

export PATH="/opt/boost_1_60_0/bin:$PATH"
export LD_LIBRARY_PATH="/opt/AMDAPPSDK-3.0/lib/x86:/opt/AMDAPPSDK-3.0/lib/x86_64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/usr/local/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/acml5.3.1/gfortran64_mp/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/boost_1_60_0/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/googletest/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="/opt/clBLAS/lib64:$LD_LIBRARY_PATH"
export TMP="/tmp"
export TMPDIR="/tmp"
export ACML_ROOT="/opt/aclm5.3.1"
export ACML_INCLUDE_DIRS="/opt/acml5.3.1/gfortran64_mp/include"
export ACML_LIBRARIES="/opt/acml5.3.1/gfortran64_mp/lib"
export BOOST_ROOT="/opt/boost_1_60_0"

export CLBLAS_STORAGE_PATH="/opt/clBLAS"
export CLBLAS_ROOT="/opt/clBLAS"
export CLBLAS_INCLUDE_DIR="/opt/clBLAS/include"

export FFTW_LIBRARIES="/usr/lib/x86_64-linux-gnu"
export FFTW_INCLUDE_DIRS="/usr/include"
export GTEST_LIBRARY="/opt/googletest/lib"
export GTEST_INCLUDE_DIR="/opt/googletest/include"
export GTEST_ROOT="/opt/googletest"
export GFLAGS_INCLUDE_DIRS="/opt/gflags/include"
export GFLAGS_LIBRARY="/opt/gflags/lib"
export GLOG_INCLUDE_DIR="/opt/glog/include"
export GLOG_LIBRARY="/opt/glog/lib"

export AMDAPPSDK_ROOT="/opt/AMDAPPSDK-3.0"
export AMDAPPSDKROOT="/opt/AMDAPPSDK-3.0"
export AMDAPP="/opt/AMDAPPSDK-3.0"

export CXX=g++
export CC=gcc
export FC=gfortran
export MKL_CBWR=AUTO

########## Boost 1.60

cd $HOME/Downloads
wget --output-document=boost_1_60_0.tar.gz  http://downloads.sourceforge.net/project/boost/boost/1.60.0/boost_1_60_0.tar.gz?r=http%3A%2F%2Fwww.boost.org%2Fusers%2Fhistory%2Fversion_1_60_0.html&ts=1451138817&use_mirror=heanet
tar -zxvf boost_1_60_0.tar.gz
cd boost_1_60_0/
sudo ./bootstrap.sh --prefix=/opt/boost_1_60_0
sudo ./b2 install -j9 --with-program_options address-model=64 

########## GFlags

cd $HOME/Downloads
git clone https://github.com/gflags/gflags
cd gflags
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/gflags -DBUILD_SHARED_LIBS=ON
make -j4
sudo checkinstall --pkgname=gflags

########## GTest

cd $HOME/Downloads
git clone https://github.com/google/googletest.git
cd googletest
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=/opt/googletest -DBUILD_SHARED_LIBS=ON
make -j4
sudo checkinstall --pkgname=gtest

########## SANITY CHECK 1 ldconfig -p | grep -i 'OPENCL|amd|blas|lapack|libgl' | grep "=> /" | awk '{print $4}' | xargs ls -la | grep " -> " | awk '{print $11}' | xargs locate

/lib/x86_64-linux-gnu/libglib-2.0.so.0.4600.1
/usr/lib/libAMDXvBA.so.1
/usr/lib/libAMDXvBA.so.1.0
/usr/lib/fglrx/libGL.so.1
/usr/lib/fglrx/libGL.so.1.2
/usr/lib/x86_64-linux-gnu/libGLU.so.1.3.1
/usr/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1.0.0
/usr/lib/x86_64-linux-gnu/libglapi.so.0.0.0
/usr/lib/x86_64-linux-gnu/libglut.so.3.9.0
/usr/lib/x86_64-linux-gnu/libgslcblas.so.0.0.0
/usr/lib/x86_64-linux-gnu/libsamdb.so.0.0.1
/usr/lib/x86_64-linux-gnu/mesa/libGL.so
/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
/usr/lib/x86_64-linux-gnu/mesa/libGL.so.1.2.0
/usr/lib32/libAMDXvBA.so.1
/usr/lib32/libAMDXvBA.so.1.0
/usr/lib32/fglrx/libAMDXvBA.so.1
/usr/lib32/fglrx/libAMDXvBA.so.1.0
/usr/lib32/fglrx/libGL.so.1
/usr/lib32/fglrx/libGL.so.1.2

clinfo

Number of platforms:                 1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

locate libOpenCL.so

/usr/lib/libOpenCL.so.1
/usr/lib32/libOpenCL.so.1

########## ACML-6.1.0.31

cd $HOME/Downloads
mkdir acml-5-3-1
tar -zxvf acml-5-3-1-gfortran-64bit.tgz -C acml-5-3-1/
cd acml-5-3-1
sudo ./install-acml-5-3-1-gfortran-64bit.sh
## UPDATE 6.1.0
cd $HOME/Downloads
sudo tar -xzvf acml-6.1.0.31-gfortran64.tgz -C /opt/acml5.3.1/

sudo updatedb
sudo ldconfig

########## AMD SDK OpenCl

cd $HOME/Downloads
tar -xvf AMD-APP-SDKInstaller-v3.0.130.135-GA-linux64.tar.bz2
sudo ./AMD-APP-SDK-v3.0.130.135-GA-linux64.sh 
reboot

########## SANITY CHECK 2 ldconfig -p | grep -i 'opencl|amd|blas|lapack|libgl'

    libsamdb.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libsamdb.so.0
    libgslcblas.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0
    libglut.so.3 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libglut.so.3
    libglut.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libglut.so
    libglib-2.0.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libglib-2.0.so.0
    libglapi.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libglapi.so.0
    libglapi.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libglapi.so
    libdrm_amdgpu.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libdrm_amdgpu.so.1
    libdrm_amdgpu.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libdrm_amdgpu.so
    libamdocl64.so (libc6,x86-64) => /usr/lib/libamdocl64.so
    libamdocl32.so (libc6) => /usr/lib32/libamdocl32.so
    libamdocl12cl64.so (libc6,x86-64) => /usr/lib/libamdocl12cl64.so
    libamdocl12cl32.so (libc6) => /usr/lib32/libamdocl12cl32.so
    libOpenCL.so.1 (libc6,x86-64) => /usr/lib/libOpenCL.so.1
    libOpenCL.so.1 (libc6) => /usr/lib32/libOpenCL.so.1
    libGLU.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libGLU.so.1
    libGLU.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libGLU.so
    libGL.so.1 (libc6,x86-64) => /usr/lib/fglrx/libGL.so.1
    libGL.so.1 (libc6) => /usr/lib32/fglrx/libGL.so.1
    libGL.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libGL.so
    libGL.so (libc6,x86-64) => /usr/lib/fglrx/libGL.so
    libGL.so (libc6,x86-64) => /usr/lib/libGL.so
    libGL.so (libc6) => /usr/lib32/fglrx/libGL.so
    libAMDXvBA.so.1 (libc6,x86-64) => /usr/lib/libAMDXvBA.so.1
    libAMDXvBA.so.1 (libc6) => /usr/lib32/fglrx/libAMDXvBA.so.1
    libAMDXvBA.so.1 (libc6) => /usr/lib32/libAMDXvBA.so.1
    libAMDXvBA.so (libc6,x86-64) => /usr/lib/libAMDXvBA.so
    libAMDXvBA.so (libc6) => /usr/lib32/fglrx/libAMDXvBA.so

clinfo

 Number of platforms:                1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
...
  Preferred platform atomic alignment:       0
  Preferred global atomic alignment:         0
  Preferred local atomic alignment:      0
Segmentation fault (core dumped)

fixing problem with clinfo

sudo mv /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so /opt/AMDAPPSDK-3.0/lib/x86_64/libamdocl12cl64.so_old
sudo ldconfig

clinfo

 Number of platforms:                1
  Platform Profile:              FULL_PROFILE
  Platform Version:              OpenCL 2.0 AMD-APP (1912.5)
  Platform Name:                 AMD Accelerated Parallel Processing
  Platform Vendor:               Advanced Micro Devices, Inc.
  Platform Extensions:               cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
...
  Version:                   OpenCL 1.2 AMD-APP (1912.5)
  Extensions:                    cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_spir cl_khr_gl_event 

check if visible libOpenCL is linked library or not ldconfig -p | grep -i libOpenCL.so

    libOpenCL.so.1 (libc6,x86-64) => /usr/lib/libOpenCL.so.1
    libOpenCL.so.1 (libc6) => /usr/lib32/libOpenCL.so.1

**la -la /usr/lib/libOpenCL.***

-rw-r--r-- 1 root root 27336 jan  4 18:57 /usr/lib/libOpenCL.so.1

**la -la /usr/lib32/libOpenCL.***

-rw-r--r-- 1 root root 29600 jan  4 18:57 /usr/lib32/libOpenCL.so.1

########## clBLAS

cd $HOME/Downloads
git clone https://github.com/clMathLibraries/clBLAS
cd clBLAS
git checkout v2.8
mkdir build
cd build
cmake ../src -DOPENCL_VERSION:STRING=2.0 -DACML_INCLUDE_DIRS:PATH=/opt/acml5.3.1/gfortran64_mp/include -DACML_LIBRARIES:FILEPATH=/opt/acml5.3.1/gfortran64_mp/lib/libacml_mp.so -DBLAS_DEBUG_TOOLS=ON -DOPENCL_OFFLINE_BUILD_HAWAII_KERNEL=ON -DBUILD_PERFORMANCE=ON -DCMAKE_INSTALL_PREFIX=/opt/clBLAS -DBUILD_SHARED_LIBS=ON -DUSE_SYSTEM_GTEST=ON -DOPENCL_LIBRARIES=/usr/lib/libOpenCL.so.1   -DOPENCL_INCLUDE_DIRS=/opt/AMDAPPSDK-3.0/include

Main parts of CMake output

-- You have confirmed OpenCL 2.0 is supported in your system
-- CORR_TEST_WITH_ACML set to ON
-- The C compiler identification is GNU 5.2.1
-- The CXX compiler identification is GNU 5.2.1
-- Using default OpenCL Compiler
-- Found ACML: /opt/acml5.3.1/gfortran64_mp/lib/libacml_mp.so
-- Found OPENCL: /usr/lib/libOpenCL.so.1  
-- Boost version: 1.60.0
-- Found GTest: /opt/googletest/lib/libgtest.so
make -j9
sudo checkinstall --pkgname=clblas

/opt/clBLAS/bin/test-functional

Initialize OpenCL and clblas...
---- Advanced Micro Devices, Inc.
SetUp: about to create command queues
[==========] Running 715 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 203 tests from ERROR
[ RUN      ] ERROR.InvalidCommandQueue
OpenCL error -36 on line 350 of /home/marcin/Downloads/clBLAS/src/library/blas/xgemm.cc
test-functional: /home/marcin/Downloads/clBLAS/src/library/blas/xgemm.cc:350: clblasStatus clblasGemm(clblasOrder, clblasTranspose, clblasTranspose, size_t, size_t, size_t, Precision, cl_mem, size_t, size_t, cl_mem, size_t, size_t, Precision, cl_mem, size_t, size_t, cl_uint, _cl_command_queue**, cl_uint, _cl_event* const*, _cl_event**) [with Precision = float; clblasStatus = clblasStatus_; clblasOrder = clblasOrder_; clblasTranspose = clblasTranspose_; size_t = long unsigned int; cl_mem = _cl_mem*; cl_uint = unsigned int; cl_command_queue = _cl_command_queue*; cl_event = _cl_event*]: Assertion `false' failed.

That should be it, I hope I did not miss anything.

I also tried removing the existing ACML 6 and replacing it with only ACML 5.3.1, then rebuilding clBlas but it did not work. It is worth mentioning that I tried test-functional for both root and non-root users.

Any suggestions on which versions of kernel and components (Boost, acml, GTest, GFlags, drivers, SDK) it should work?

TimmyLiu commented 8 years ago

I think I may know what's happening.

test-functional is feeding an invalid command queue to clblas and expect a return of clblasInvalidCommandQueue as defined in clblas.h

In xgemm.cc there is an assert statement right after a call to clgetCommandQueueInfo, which of course should return an error code and the assertion failed.

I think this is a bug that invalid command queue was not properly handled. Instead of aborting, the clblas api should return the error code quietly.

Maybe for the purpose of using the library you can ignore test-functional for now. Have you tried test-short, which test the correctness (shorter version for test-correctness.)

mpekalski commented 8 years ago

I commented out two tests, and now I am able to run test-functional, still it fails on couple of tests, see below.

The tests I commented out before building clBLAS:

TEST(ERROR, InvalidCommandQueue) {
    ErrorClass<GemmMetod<float> > ec;
    ec.error(CL_INVALID_COMMAND_QUEUE);
}

TEST(ERROR, InvalidEventWaitList) {
    ErrorClass<GemmMetod<float> > ec;
    ec.error(CL_INVALID_EVENT_WAIT_LIST);
}

Summary of test-functional

[----------] Global test environment tear-down
[==========] 713 tests from 5 test cases ran. (237051 ms total)
[  PASSED  ] 706 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] ERROR.InvalidMemObject
[  FAILED  ] ERROR.InvalidValue
[  FAILED  ] ERROR.InvalidValuesymm
[  FAILED  ] ERROR.InvalidValuehemm
[  FAILED  ] THREAD.cgemm
[  FAILED  ] THREAD.dgemm
[  FAILED  ] THREAD.zgemm

Details of failed tests

[ RUN      ] ERROR.InvalidMemObject
/home/marcin/Downloads/clBLAS/src/tests/functional/func-error.cpp:118: Failure
Value of: err_etalon
  Actual: -1022
Expected: err
Which is: 0
clFinish()
[  FAILED  ] ERROR.InvalidMemObject (566 ms)
[ RUN      ] ERROR.InvalidValue
/home/marcin/Downloads/clBLAS/src/tests/functional/func-error.cpp:118: Failure
Value of: err_etalon
  Actual: -1011
Expected: err
Which is: 0
clFinish()
[  FAILED  ] ERROR.InvalidValue (149 ms)
[ RUN      ] ERROR.InvalidValuesymm
/home/marcin/Downloads/clBLAS/src/tests/functional/func-error.cpp:118: Failure
Value of: err_etalon
  Actual: -1010
Expected: err
Which is: -1011
clFinish()
[  FAILED  ] ERROR.InvalidValuesymm (188 ms)
[ RUN      ] ERROR.InvalidValuehemm
/home/marcin/Downloads/clBLAS/src/tests/functional/func-error.cpp:118: Failure
Value of: err_etalon
  Actual: -1010
Expected: err
Which is: -1011
clFinish()
[  FAILED  ] ERROR.InvalidValuehemm (256 ms)

In those two tests I got some additional output (Invalid Size of X), which I guess should not be there:

[ RUN      ] ERROR.InvalidMemObjectnrm2
Invalid Size for X
[       OK ] ERROR.InvalidMemObjectnrm2 (2 ms)
[ RUN      ] ERROR.InvalidValuenrm2
Invalid Size for X
[       OK ] ERROR.InvalidValuenrm2 (3 ms)
[ RUN      ] THREAD.cgemm
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:397: Failure
The difference between ((a).s[0]) and ((b).s[0]) is 163485, which exceeds delta, where
((a).s[0]) evaluates to -163510,
((b).s[0]) evaluates to -25, and
delta evaluates to 0.
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:397: Failure
The difference between ((a).s[0]) and ((b).s[0]) is 163485, which exceeds delta, where
((a).s[0]) evaluates to -163510,
((b).s[0]) evaluates to -25, and
delta evaluates to 0.
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:397: Failure
The difference between ((a).s[0]) and ((b).s[0]) is 1106519114, which exceeds delta, where
((a).s[0]) evaluates to -163510,
((b).s[0]) evaluates to -1106682624, and
delta evaluates to 0.
[  FAILED  ] THREAD.cgemm (1011 ms)
[ RUN      ] THREAD.dgemm
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 100999564502815, which exceeds delta, where
a evaluates to 100999564185861,
b evaluates to -316954, and
delta evaluates to 0.
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 7271968644202680, which exceeds delta, where
a evaluates to 100999564185861,
b evaluates to -7170969080016819, and
delta evaluates to 0.
[  FAILED  ] THREAD.dgemm (1008 ms)
[ RUN      ] THREAD.zgemm
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:472: Failure
The difference between ((a).s[0]) and ((b).s[0]) is 10702371064342262, which exceeds delta, where
((a).s[0]) evaluates to -18416369580656,
((b).s[0]) evaluates to 10683954694761606, and
delta evaluates to 0.
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:472: Failure
The difference between ((a).s[0]) and ((b).s[0]) is 18416368935676, which exceeds delta, where
((a).s[0]) evaluates to -18416369580656,
((b).s[0]) evaluates to -644980, and
delta evaluates to 0.
[  FAILED  ] THREAD.zgemm (1061 ms)
mpekalski commented 8 years ago

Regarding test-short it was really short as it crashed quite soon after starting.

Tests that run and failed:

[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/0
             seed = 12345, queues = 1, clblasColumnMajor, clblasNoTrans, clblasNoTrans, M = 63, N = 63, K = 63, offA = 0, offB = 0, offC = 0, lda = 63, ldb = 63, ldc = 63
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 121641, which exceeds delta, where
a evaluates to -12,
b evaluates to -121653, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/0, where GetParam() = (1, 0, 0, 63, 63, 63, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (79 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/7
             seed = 12345, queues = 1, clblasColumnMajor, clblasNoTrans, clblasNoTrans, M = 128, N = 128, K = 128, offA = 0, offB = 0, offC = 0, lda = 128, ldb = 128, ldc = 128
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 90788, which exceeds delta, where
a evaluates to -4,
b evaluates to -90792, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/7, where GetParam() = (1, 0, 0, 128, 128, 128, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (26 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/8
             seed = 12345, queues = 1, clblasColumnMajor, clblasNoTrans, clblasTrans, M = 63, N = 63, K = 63, offA = 0, offB = 0, offC = 0, lda = 63, ldb = 63, ldc = 63
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 121641, which exceeds delta, where
a evaluates to -12,
b evaluates to -121653, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/8, where GetParam() = (1, 0, 1, 63, 63, 63, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (80 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/15
             seed = 12345, queues = 1, clblasColumnMajor, clblasNoTrans, clblasTrans, M = 128, N = 128, K = 128, offA = 0, offB = 0, offC = 0, lda = 128, ldb = 128, ldc = 128
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 90788, which exceeds delta, where
a evaluates to -4,
b evaluates to -90792, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/15, where GetParam() = (1, 0, 1, 128, 128, 128, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (26 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/24
             seed = 12345, queues = 1, clblasColumnMajor, clblasTrans, clblasNoTrans, M = 63, N = 63, K = 63, offA = 0, offB = 0, offC = 0, lda = 63, ldb = 63, ldc = 63
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 121641, which exceeds delta, where
a evaluates to -12,
b evaluates to -121653, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/24, where GetParam() = (1, 1, 0, 63, 63, 63, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (78 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/31
             seed = 12345, queues = 1, clblasColumnMajor, clblasTrans, clblasNoTrans, M = 128, N = 128, K = 128, offA = 0, offB = 0, offC = 0, lda = 128, ldb = 128, ldc = 128
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 90788, which exceeds delta, where
a evaluates to -4,
b evaluates to -90792, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/31, where GetParam() = (1, 1, 0, 128, 128, 128, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (27 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/32
             seed = 12345, queues = 1, clblasColumnMajor, clblasTrans, clblasTrans, M = 63, N = 63, K = 63, offA = 0, offB = 0, offC = 0, lda = 63, ldb = 63, ldc = 63
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 121641, which exceeds delta, where
a evaluates to -12,
b evaluates to -121653, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/32, where GetParam() = (1, 1, 1, 63, 63, 63, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (79 ms)
[ RUN      ] ColumnMajor_SmallRange/GEMM.sgemm/39
             seed = 12345, queues = 1, clblasColumnMajor, clblasTrans, clblasTrans, M = 128, N = 128, K = 128, offA = 0, offB = 0, offC = 0, lda = 128, ldb = 128, ldc = 128
m : 0    n: 0
/home/marcin/Downloads/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 90788, which exceeds delta, where
a evaluates to -4,
b evaluates to -90792, and
delta evaluates to 0.
[  FAILED  ] ColumnMajor_SmallRange/GEMM.sgemm/39, where GetParam() = (1, 1, 1, 128, 128, 128, 48-byte object <00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00 00-00>, 1) (28 ms)

Segmentation fault (core dumped)

[ RUN      ] ColumnMajor_SmallRange/GEMM.dgemm/0
             seed = 12345, queues = 1, clblasColumnMajor, clblasNoTrans, clblasNoTrans, M = 63, N = 63, K = 63, offA = 0, offB = 0, offC = 0, lda = 63, ldb = 63, ldc = 63
Segmentation fault (core dumped)
TimmyLiu commented 8 years ago

Can you try using netlib's blas (libblas.so) as CPU reference library for testing? Uncheck CORR_TEST_WITH_ACML in cmake and make sure Netlib_BLAS_LIBRARY is found.

mpekalski commented 8 years ago

No problem. Intersting that when I build the libblas.so myself the make failed

wget http://www.netlib.org/blas/blas-3.6.0.tgz
tar -zxvf blas-3.6.0.tgz 
BLAS

sudo mkdir /opt/netlib_blas
sudo mkdir /opt/netlib_blas/lib
tar xvf blas.tgz
cd BLAS/

Edit make.inc
OPTS = -O3 -shared -m64 -march=native -fPIC

sudo make all
sudo gfortran -shared -Wl,-soname,libnetblas.so -o libblas.so.1.0.1 *.o -lc
sudo ln -s libblas.so.1.0.1 libnetblas.so
sudo cp lib*blas* /opt/netlib/blas/lib

To see whether everything linked ok:

ldd libnetblas.so 
    linux-vdso.so.1 =>  (0x00007ffd42bfd000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f50a0e8c000)
    libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f50a0b60000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f50a0858000)
    /lib64/ld-linux-x86-64.so.2 (0x0000562209e1e000)
    libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f50a0619000)
    libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f50a0401000)
cmake ../src -DOPENCL_VERSION:STRING=2.0 -DBLAS_DEBUG_TOOLS=ON -DOPENCL_OFFLINE_BUILD_HAWAII_KERNEL=ON -DBUILD_PERFORMANCE=ON -DCMAKE_INSTALL_PREFIX=/opt/clBLAS -DBUILD_SHARED_LIBS=ON -DUSE_SYSTEM_GTEST=ON -DOPENCL_LIBRARIES=/usr/lib/libOpenCL.so.1   -DOPENCL_INCLUDE_DIRS=/opt/AMDAPPSDK-3.0/include -DCORR_TEST_WITH_ACML=OFF -DNetlib_BLAS_LIBRARY=/opt/netlib_blas/lib/libblas.so.1.0.1

make failed

Linking Fortran executable ../staging/test-correctness
CMakeFiles/test-correctness.dir/correctness/blas-lapack.c.o: In function `cdotu':
/home/marcin/Downloads/clBLAS/src/tests/correctness/blas-lapack.c:658: undefined reference to `cdotusub_'
CMakeFiles/test-correctness.dir/correctness/blas-lapack.c.o: In function `zdotu':
/home/marcin/Downloads/clBLAS/src/tests/correctness/blas-lapack.c:673: undefined reference to `zdotusub_'
CMakeFiles/test-correctness.dir/correctness/blas-lapack.c.o: In function `cdotc':
/home/marcin/Downloads/clBLAS/src/tests/correctness/blas-lapack.c:688: undefined reference to `cdotcsub_'
CMakeFiles/test-correctness.dir/correctness/blas-lapack.c.o: In function `zdotc':
/home/marcin/Downloads/clBLAS/src/tests/correctness/blas-lapack.c:703: undefined reference to `zdotcsub_'
collect2: error: ld returned 1 exit status
tests/CMakeFiles/test-correctness.dir/build.make:1514: recipe for target 'staging/test-correctness' failed
make[2]: *** [staging/test-correctness] Error 1
CMakeFiles/Makefile2:379: recipe for target 'tests/CMakeFiles/test-correctness.dir/all' failed
make[1]: *** [tests/CMakeFiles/test-correctness.dir/all] Error 2
Makefile:136: recipe for target 'all' failed
make: *** [all] Error 2

But when I installed libblas-dev (with dependenices libblas-common libblas-dev libblas3), which as I understand is the Netlib's BLAS, linked cmake to /usr/lib/libblas/libblas.so.3.0, and it build properly.

Then I run test-short

[----------] Global test environment tear-down
[==========] 10096 tests from 125 test cases ran. (599045 ms total)
[  PASSED  ] 10096 tests.

and test-functional, still with the first two tests being commented out

[----------] Global test environment tear-down
[==========] 713 tests from 5 test cases ran. (230487 ms total)
[  PASSED  ] 706 tests.
[  FAILED  ] 7 tests, listed below:
[  FAILED  ] ERROR.InvalidMemObject
[  FAILED  ] ERROR.InvalidValue
[  FAILED  ] ERROR.InvalidValuesymm
[  FAILED  ] ERROR.InvalidValuehemm
[  FAILED  ] THREAD.cgemm
[  FAILED  ] THREAD.dgemm
[  FAILED  ] THREAD.zgemm

Now I am running test-correctness and will update this comment with the results.

tingxingdong commented 8 years ago

Lapack 3.6.0 added a folder named CBLAS which includes routines named xdotusub. You may not need go so far to 3.0. You may only need to downgrade to 3.5.0 which is a lot newer.

mpekalski commented 8 years ago

the test-correctness crashed

[ RUN      ] ColumnMajor_SmallRange/SYR2K.ssyr2k/261
clblasColumnMajor, clblasLower, clblasNoTrans
N = 17, K = 8
offA = 0, offB = 0, offC = 0
lda = 17, ldb = 17, ldc = 17
seed = 12345
queues = 1
Generating input data... Done
Calling reference xSYR2K routine... Done
Calling clblas xSYR2K routine... Segmentation fault (core dumped)
TimmyLiu commented 8 years ago

Hi, PR #214 should fix the test-function fails. Can you try it out?

mpekalski commented 8 years ago

Looks like test-functional passes the initial tests, but it crashes later on

[ RUN      ] ERROR.InvalidCommandQueue
[       OK ] ERROR.InvalidCommandQueue (134 ms)
[ RUN      ] ERROR.InvalidEventWaitList
[       OK ] ERROR.InvalidEventWaitList (211 ms)
[ RUN      ] ERROR.InvalidMemObject
[       OK ] ERROR.InvalidMemObject (133 ms)
[ RUN      ] ERROR.InvalidValue
[       OK ] ERROR.InvalidValue (131 ms)
[ RUN      ] ERROR.InvalidDevice
[       OK ] ERROR.InvalidDevice (0 ms)
[ RUN      ] THREAD.dtrsm
OpenCL error -52 on line 1080
test-functional: /home/marcin/Downloads/clBLAS/src/library/blas/xtrsm.cc:1080: cl_int diag_dtrtri128(cl_command_queue, int, clblasUplo, clblasDiag, cl_mem, size_t, cl_mem, size_t, int, int, _cl_event**): Assertion `false' failed.
Aborted (core dumped)

line 1080 is similarly like before

CL_CHECK(err);
guacamoleo commented 8 years ago

We have just merged in a fix to the develop branch which should fix all GEMM thread safety issues; please test and re-issue bug if not resolved.

mpekalski commented 8 years ago

Now when I build the project and run any tests they run on CPU instead of GPU (GPU load is 0%, and one core of the CPU shows 100%). It did not happen before. I build the project using exactly the same parameters as before, no issues with compiling or building in general.

Could this thread safety fix be the cause?

It does work on CPU instead of GPU. Even on CPU it uses only one core.

guacamoleo commented 8 years ago

With the fix, if each cpu thread uses it own opencl context, then each thread has to compile its own opencl kernel. That probably takes a lot longer than executing the kernels itself. That's a possible reason that the cpu usage is high and gpu usage is low.

mpekalski commented 8 years ago

I run the test, see the output below. I am still surprised during the both tests the GPU usage was very low almost always showing 0%, sometimes jumping to 3%.

The CPU use was 100% on one core when doing one of larger tests

Calling reference xTRSM routine... 

then it switched to another core showing there 100% use, GPU constantly on 0%

Calling reference xTRSM routine... Done
Calling clblas xTRSM routine... 

Regarding the tests:

test-functional

[----------] Global test environment tear-down
[==========] 715 tests from 5 test cases ran. (235142 ms total)
[  PASSED  ] 714 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] THREAD.dtrsm

details

[ RUN      ] THREAD.dtrsm
m : 0    n: 0
/clBLAS/src/tests/include/matrix.h:327: Failure
The difference between a and b is 654135552, which exceeds delta, where
a evaluates to 654135552,
b evaluates to 0, and
delta evaluates to 0.
[  FAILED  ] THREAD.dtrsm (1159 ms)

Then I run it again

./test-functional --gtest_filter=*dtrsm*
./test-medium --gtest_filter=*dtrsm*
./test-correctness --gtest_filter=*dtrsm*

It passed all of them ... I do not know why it failed for the first time.

Test environment:

Device name: Hawaii
Device vendor: Advanced Micro Devices, Inc.
Platform (bit): Linux
clblas version: 2.11.0
Driver version: 1912.5 (VM)
Device version: OpenCL 2.0 AMD-APP (1912.5)
Global mem size: 3911 MB