CNugteren / CLBlast

Tuned OpenCL BLAS
Apache License 2.0
1.06k stars 202 forks source link

Segfaults with double precision tuning on Fiji #107

Closed ghost closed 7 years ago

ghost commented 8 years ago

Calling clblast_xgemm_tuner with -platform 0 -device 0 -precision 64 (double) -m 1024 [=default] -n 1024 [=default] -k 1024 [=default] -alpha 2.000000 [=default] -beta 2.000000 [=default] -fraction 512.000000 results in a SIGSEGV. I think there was a issue that was filed earlier that indicated that the cause of such segfaults was to be found in the driver. I'm currently usingCatalyst 15.12 with a R9 Nano.

GNU gdb (GDB) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./clblast_tuner_xgemm...(no debugging symbols found)...done.
(gdb) run -platform 0 -device 0 -precision 64 -m 1024 -n 1024 -k 1024 -alpha 2.00000 -beta 2.00000 -fraction 512.00000 
Starting program: /home/neptune/devel/clblast/build/clblast_tuner_xgemm -platform 0 -device 0 -precision 64 -m 1024 -n 1024 -k 1024 -alpha 2.00000 -beta 2.00000 -fraction 512.00000
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -precision 64 (double) 
    -m 1024 [=default]
    -n 1024 [=default]
    -k 1024 [=default]
    -alpha 2.000000 [=default]
    -beta 2.000000 [=default]
    -fraction 512.000000 

Error: No root privilege. Please check with the system-admin.
Error: No root privilege. Please check with the system-admin.
[New Thread 0x7fffed6dc700 (LWP 8540)]

[==========] Initializing on platform 0 device 0
[==========] Device name: 'Fiji' (OpenCL 2.0 AMD-APP (1912.5))

[----------] Testing reference Xgemm
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (10.9 ms) - 1 out of 1

[----------] Testing kernel Xgemm
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (5.7 ms) - 1 out of 1

[----------] Printing results to stdout
[ RESULT   ] Xgemm;      5.7 ms;   MWG 32;   NWG 64;   KWG 32; MDIMC 32;  NDIMC 8; MDIMA 32;  NDIMB 8;    KWI 2;    VWM 1;    VWN 2;   STRM 0;   STRN 0;     SA 1;     SB 1;PRECISION 64;

[----------] Printing best result to stdout
[     BEST ] Xgemm;      5.7 ms;   MWG 32;   NWG 64;   KWG 32; MDIMC 32;  NDIMC 8; MDIMA 32;  NDIMB 8;    KWI 2;    VWM 1;    VWN 2;   STRM 0;   STRN 0;     SA 1;     SB 1;PRECISION 64;

[----------] Printing best result in database format to stdout
{ "Fiji", { {"MWG",32}, {"NWG",64}, {"KWG",32}, {"MDIMC",32}, {"NDIMC",8}, {"MDIMA",32}, {"NDIMB",8}, {"KWI",2}, {"VWM",1}, {"VWN",2}, {"STRM",0}, {"STRN",0}, {"SA",1}, {"SB",1}, {"PRECISION",64} } }
[ -------> ] 5.7 ms or 375.0 GFLOPS

[----------] Printing results to file in JSON format

[==========] End of the tuning process

[Thread 0x7fffed6dc700 (LWP 8540) exited]
* Options given/available:
    -platform 0 [=default]
    -device 0 [=default]
    -precision 64 (double) 
    -m 1024 [=default]
    -n 1024 [=default]
    -k 1024 [=default]
    -alpha 2.000000 [=default]
    -beta 2.000000 [=default]
    -fraction 512.000000 [=default]

[New Thread 0x7fffed6dc700 (LWP 8550)]

[==========] Initializing on platform 0 device 0
[==========] Device name: 'Fiji' (OpenCL 2.0 AMD-APP (1912.5))

[----------] Testing reference Xgemm
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (10.9 ms) - 1 out of 1

[----------] Testing kernel Xgemm
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (6.6 ms) - 1 out of 213
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (7.8 ms) - 2 out of 213
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (7.3 ms) - 3 out of 213
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (7.9 ms) - 4 out of 213
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (6.3 ms) - 5 out of 213
[ RUN      ] Running Xgemm
[       OK ] Completed Xgemm (10.8 ms) - 6 out of 213

Thread 1 "clblast_tuner_x" received signal SIGSEGV, Segmentation fault.
0x00007ffff32509be in ?? () from /usr/lib/libamdocl64.so
(gdb) backtrace
#0  0x00007ffff32509be in ?? () from /usr/lib/libamdocl64.so
#1  0x00007ffff325e5b4 in ?? () from /usr/lib/libamdocl64.so
#2  0x00007ffff325fdfa in ?? () from /usr/lib/libamdocl64.so
#3  0x00007ffff3260b9b in ?? () from /usr/lib/libamdocl64.so
#4  0x00007ffff2e56b80 in ?? () from /usr/lib/libamdocl64.so
#5  0x00007ffff2e5a7f4 in ?? () from /usr/lib/libamdocl64.so
#6  0x00007ffff2e5afc1 in ?? () from /usr/lib/libamdocl64.so
#7  0x00007ffff320c195 in ?? () from /usr/lib/libamdocl64.so
#8  0x00007fffef3760d7 in amdcl::scCompileSIImpl::CompileOnce(amdcl::_il_binary_rec&, unsigned long&) () from /usr/lib/libamdocl12cl64.so
#9  0x00007fffef37610d in amdcl::scCompileSI::Compile(amdcl::_il_binary_rec&, unsigned long&) () from /usr/lib/libamdocl12cl64.so
#10 0x00007fffef379992 in amdcl::AMDIL::assemble(amdcl::_il_binary_rec*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, amdcl::scCompileBase*) ()
   from /usr/lib/libamdocl12cl64.so
#11 0x00007fffef37a7cd in amdcl::AMDIL::compile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, amdcl::scCompileBase*) () from /usr/lib/libamdocl12cl64.so
#12 0x00007fffef379070 in amdcl::AMDIL::compile(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) () from /usr/lib/libamdocl12cl64.so
#13 0x00007fffeee96ab4 in AMDILFEToISA(_acl_loader_data_0_8**, char const*, unsigned long) () from /usr/lib/libamdocl12cl64.so
#14 0x00007fffeee988b5 in if_aclCompile(_acl_compiler_rec_0_8_1*, _acl_bif_rec_0_8_1*, char const*, _acl_type_enum_0_8, _acl_type_enum_0_8, void (*)(char const*, unsigned long)) ()
   from /usr/lib/libamdocl12cl64.so
#15 0x00007ffff338c5d9 in aclCompile () from /usr/lib/libamdocl64.so
#16 0x00007ffff2a5c0e8 in ?? () from /usr/lib/libamdocl64.so
#17 0x00007ffff2a5c760 in ?? () from /usr/lib/libamdocl64.so
#18 0x00007ffff2a6b112 in ?? () from /usr/lib/libamdocl64.so
#19 0x00007ffff2a6d92a in ?? () from /usr/lib/libamdocl64.so
#20 0x00007ffff2a1ab4f in ?? () from /usr/lib/libamdocl64.so
#21 0x00007ffff2a2a8ec in ?? () from /usr/lib/libamdocl64.so
#22 0x00007ffff2a0b9d0 in clBuildProgram () from /usr/lib/libamdocl64.so
#23 0x00007ffff77b83a3 in cltune::TunerImpl::RunKernel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cltune::KernelInfo const&, unsigned long, unsigned long) () from /usr/lib/libcltune.so
#24 0x00007ffff77ba661 in cltune::TunerImpl::Tune() () from /usr/lib/libcltune.so
#25 0x000000000042cd49 in void clblast::Tuner<clblast::TuneXgemm<double, 2>, double>(int, char**) ()
#26 0x000000000040c9aa in main ()
CNugteren commented 8 years ago

Are you testing on the development branch? Perhaps that fixes some issues already. Furthermore, I would suggest not to give the -fraction argument, such that the first set of tuning tests will run fully (those are the most likely parameters to yield good results anyway).

ghost commented 8 years ago

I was testing on master, but a similar error shows up even on development. It looks like it's segfaulting inside the compiler.

MigMuc commented 8 years ago

I don't know if the errors I get tuning the Xgemm with double precision are related to the above mentioned problem, but I will post it here first. `[ RUN ] Running Xgemm [ OK ] Completed Xgemm (25.6 ms) - 167 out of 213 device compiler error/warning: Error:E012:Insufficient Local Resources!

[ FAILED ] Kernel Xgemm failed [ FAILED ] catched exception: device compiler error/warning occurred

[ FAILED ] Xgemm; 0.0 ms; MWG 32; NWG 128; KWG 32; MDIMC 16; NDIMC 16; MDIMA 32; NDIMB 16; KWI 2; VWM 1; VWN 4; STRM 0; STRN 1; SA 1; SB 1;PRECISION 64; [ RUN ] Running Xgemm [ OK ] Completed Xgemm (17.0 ms) - 169 out of 213 `

Maybe it is related to the driver but compared with the version above my version is: Device name: 'Tonga' (OpenCL 2.0 AMD-APP (1800.8))

There is one thread related to this error at the khronos forums.

CNugteren commented 7 years ago

Is this still an issue with the latest AMD compilers? And do you think it is related to CLBlast at all?

ghost commented 7 years ago

I haven't seen updates to catalyst, so this should still be true.

MigMuc commented 7 years ago

In this thread https://github.com/clMathLibraries/clBLAS/issues/207 there is a solution to the issue you are describing. I managed to get thinks working after linking against libOpenCL provided by the driver NOT by the AMD APP.

CNugteren commented 7 years ago

Are you sure that issue is related? Also, you two might be discussing different issues: @akssri can you try @MigMuc's suggestion and link to a different libOpenCL and see if that works?

Still I think AMD's compiler shouldn't segfault, so it is probably a good idea to report this to AMD.

ghost commented 7 years ago

@MigMuc libOpenCL is currently only being provided by AMDAPP on my machine. It looks like the bug for clMathLibraries/clBLAS#207 has something to do with multiple CPU threads ?

@CNugteren Yes, this should definitely be AMDs task to fix; feel free to close this (or perhaps add some archival flag). I'll try to dump the kernel that is causing this issue and file a bug report with AMD.