CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
225 stars 33 forks source link

Mali GPU: gets stuck at clFinish() in prepareDeviceVariables() #678

Closed pjaaskel closed 11 months ago

pjaaskel commented 12 months ago

With #676 applied, a simple compile test case gets stuck in a clFinish() call.

chiptest@odroid:~/src/chipStar/build-llvm-15$ gdb --args /home/chiptest/src/chipStar/build-llvm-15/catch/catch_tests/unit/deviceLib/SinglePre
cisionIntrinsics/__cosf "Unit_deviceFunctions_CompileTest___cosf_float"
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/chiptest/src/chipStar/build-llvm-15/catch/catch_tests/unit/deviceLib/SinglePrecisionIntrinsics/__cosf...
(No debugging symbols found in /home/chiptest/src/chipStar/build-llvm-15/catch/catch_tests/unit/deviceLib/SinglePrecisionIntrinsics/__cosf)
(gdb) r
Starting program: /home/chiptest/src/chipStar/build-llvm-15/catch/catch_tests/unit/deviceLib/SinglePrecisionIntrinsics/__cosf Unit_deviceFunc
tions_CompileTest___cosf_float
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fb4687020 (LWP 261197)]
[New Thread 0x7fb3e77020 (LWP 261198)]
[New Thread 0x7fb3667020 (LWP 261199)]
[New Thread 0x7fb2e57020 (LWP 261200)]
[New Thread 0x7fb2647020 (LWP 261201)]
[New Thread 0x7fb1e37020 (LWP 261202)]
[New Thread 0x7fb1627020 (LWP 261203)]
[New Thread 0x7fb0e17020 (LWP 261204)]
CHIP warning [TID 261194] [1699455731.934338902] : The device might not support subgroup size 32, warp-size sensitive kernels might not work correctly.
Filters: Unit_deviceFunctions_CompileTest___cosf_float
...
(gdb) thread apply all bt

Thread 9 (Thread 0x7fb0e17020 (LWP 261204) "mali-cmar-backe"):
#0  0x0000007fb7a6bcf8 in __GI___poll (fds=0x7fb0e16630, nfds=4, timeout=<optimized out>) at ../sysdeps/unix/sysv/linux/poll.c:41
#1  0x0000007fb5ea1fd8 in ?? () from /usr/share/mali/libmali.so
#2  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#3  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 8 (Thread 0x7fb1627020 (LWP 261203) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555695c18) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555695c18) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555695c18, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x5555695c18, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x5555695c18, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 7 (Thread 0x7fb1e37020 (LWP 261202) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555695b70) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555695b70) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555695b70, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x5555695b70, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x5555695b70, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 6 (Thread 0x7fb2647020 (LWP 261201) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555695ac8) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555695ac8) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555695ac8, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
--Type <RET> for more, q to quit, c to continue without paging--
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x5555695ac8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x5555695ac8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 5 (Thread 0x7fb2e57020 (LWP 261200) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555695a20) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555695a20) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555695a20, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x5555695a20, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x5555695a20, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 4 (Thread 0x7fb3667020 (LWP 261199) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x5555695978) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x5555695978) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x5555695978, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x5555695978, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x5555695978, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so

#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 3 (Thread 0x7fb3e77020 (LWP 261198) "mali-utility-wo"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55556958d0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x55556958d0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55556958d0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x55556958d0, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
--Type <RET> for more, q to quit, c to continue without paging--
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x55556958d0, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ea1d8c in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 2 (Thread 0x7fb4687020 (LWP 261197) "mali-mem-purge"):
#0  __futex_abstimed_wait_common64 (private=<optimized out>, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x7fb4739480) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=<optimized out>, abstime=0x0, clockid=0, expected=0, futex_word=0x7fb4739480) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fb4739480, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a159a4 in do_futex_wait (sem=sem@entry=0x7fb4739480, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#4  0x0000007fb7a15a5c in __new_sem_wait_slow64 (sem=0x7fb4739480, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183
#5  0x0000007fb5ec6158 in ?? () from /usr/share/mali/libmali.so
#6  0x0000007fb7a0d5c8 in start_thread (arg=0x0) at ./nptl/pthread_create.c:442
#7  0x0000007fb7a75d9c in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:79

Thread 1 (Thread 0x7fb7fc0020 (LWP 261194) "__cosf"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55557654d0) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (cancel=true, private=0, abstime=0x0, clockid=0, expected=0, futex_word=0x55557654d0) at ./nptl/futex-internal.c:87
#2  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55557654d0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x0000007fb7a0c8fc in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55557654d8, cond=0x55557654a8) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x55557654a8, mutex=0x55557654d8) at ./nptl/pthread_cond_wait.c:627
#5  0x0000007fb604331c in osup_sync_object_wait () from /usr/share/mali/libmali.so
#6  0x0000007fb4bdb190 in ?? () from /usr/share/mali/libmali.so
#7  0x0000007fb4bebe2c in ?? () from /usr/share/mali/libmali.so
#8  0x0000007fb4be60f4 in ?? () from /usr/share/mali/libmali.so
#9  0x0000007fb4be621c in ?? () from /usr/share/mali/libmali.so
#10 0x0000007fb4bb37ac in clFinish () from /usr/share/mali/libmali.so
#11 0x0000007fb7f58158 in CHIPQueueOpenCL::finish() () from /home/chiptest/src/chipStar/build-llvm-15/libCHIP.so
#12 0x0000007fb7ece7cc in chipstar::Module::prepareDeviceVariablesNoLock(chipstar::Device*, chipstar::Queue*) () from /home/chiptest/src/chipStar/build-llvm-15/libCHIP.so
#13 0x0000007fb7ed4204 in chipstar::Device::prepareDeviceVariables(HostPtr) () from /home/chiptest/src/chipStar/build-llvm-15/libCHIP.so
#14 0x0000007fb7f2892c in hipLaunchKernelInternal(void const*, dim3, dim3, void**, unsigned long, ihipStream_t*) () from /home/chiptest/src/chipStar/build-llvm-15/libCHIP.so
--Type <RET> for more, q to quit, c to continue without paging--
#15 0x0000007fb7f286e0 in hipLaunchKernel () from /home/chiptest/src/chipStar/build-llvm-15/libCHIP.so
#16 0x000000555556f96c in ____C_A_T_C_H____T_E_S_T____17() ()
#17 0x0000005555586910 in Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#18 0x0000005555585ee0 in Catch::RunContext::runTest(Catch::TestCase const&) ()
#19 0x000000555558b240 in Catch::Session::runInternal() ()
#20 0x000000555558a22c in Catch::Session::run() ()
#21 0x00000055555a0e78 in main ()
pjaaskel commented 12 months ago

It passes sometimes. Could be a race related to the event-based synchronization.

...
CHIP info [TID 339310] [1699538655.019354172] : 
Launching kernel __chip_reset_non_symbols
GridDim: <1, 1, 1> BlockDim: <1, 1, 1>
SharedMem: 0
NumArgs: 0

CHIP debug [TID 339310] [1699538655.019443719] : Setting LastEvent for 0x55aa4cc480 0x0 -> 0x55aa796e10
CHIP debug [TID 339310] [1699538655.019461678] : Tracking chipstar::Event 0x55aa41ec30 in Backend::Events
pjaaskel commented 12 months ago

Nope, I think it's a serious driver-related issue. I tried to run example1 from PoCL examples. Works with PoCL-CPU on the Arm CPU of that SoC, but gets stuck with the Mali driver:

chiptest@odroid:~/src/pocl/build$ POCL_BUILDING=1 OCL_ICD_VENDORS=$PWD/ocl-vendors/pocl-tests.icd examples/example1/example1
(0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 4.000000
(2.000000, 2.000000, 2.000000, 2.000000) . (2.000000, 2.000000, 2.000000, 2.000000) = 16.000000
(3.000000, 3.000000, 3.000000, 3.000000) . (3.000000, 3.000000, 3.000000, 3.000000) = 36.000000
...
chiptest@odroid:~/src/pocl/build$ OCL_ICD_VENDORS=/etc/OpenCL/vendors/mali.icd examples/example1/example1 

^C^C^C
chiptest@odroid:~/src/pocl/build$ OCL_ICD_VENDORS=/etc/OpenCL/vendors/mali.icd examples/example1/example1 
(0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 0.000000
FAIL
chiptest@odroid:~/src/pocl/build$ OCL_ICD_VENDORS=/etc/OpenCL/vendors/mali.icd examples/example1/example1 
(0.000000, 0.000000, 0.000000, 0.000000) . (0.000000, 0.000000, 0.000000, 0.000000) = 0.000000
(1.000000, 1.000000, 1.000000, 1.000000) . (1.000000, 1.000000, 1.000000, 1.000000) = 0.000000
FAIL
chiptest@odroid:~/src/pocl/build$ OCL_ICD_VENDORS=/etc/OpenCL/vendors/mali.icd examples/example1/example1 

It either gets stuck or finishes but without results.