I have a AMD Vega 64 GPU with the newest ROCm 2.10 Driver on a Ubuntu 18.04.03 Linux headless Server.
uname -a
Linux 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
See clinfo and rocminfo at the bottom for more informations.
My OpenCL Programm freezes always at the clCreateCommandQueue.
I have run my Programm in GDB after compiling it with gcc and this is the ouput that show somehow a deadlock BUG in the ROCm OpenCL implementation!
Another more simple non multi threaded OpenCl Example Programms not compiled with pthreads gcc option runs fine on the same machine however.
So it is not the hardware the problem it has to be somehting with the compilation and the kernel and the software that causes the futex Run Time DeadLock Bug in a Programm.
Please Help me fix this Problem i spent the last days to figure out what the problem is without any success. Very frustrating this BUG!
Thanks in advance for Helping me solve this Problem !
Thread 1 "myopencl" received signal SIGINT, Interrupt.
0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.
(gdb) bt
0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111
2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181
3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
8 0x0000555555559327 in main (argc=, argv=) at myopencl.c:9519
(gdb) thread apply all bt
Thread 10 (Thread 0x7ffedbfff700 (LWP 10072)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7ffedbfff700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 9 (Thread 0x7ffee0ffe700 (LWP 10071)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7ffee0ffe700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 8 (Thread 0x7ffee17ff700 (LWP 10070)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7ffee17ff700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 7 (Thread 0x7fffecf61700 (LWP 10069)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7fffecf61700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 6 (Thread 0x7fffed762700 (LWP 10068)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7fffed762700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 5 (Thread 0x7fffedf63700 (LWP 10067)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
---Type to continue, or q to quit---
5 0x00007ffff77306db in start_thread (arg=0x7fffedf63700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 4 (Thread 0x7fffee764700 (LWP 10066)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7fffee764700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 3 (Thread 0x7fffeef65700 (LWP 10065)):
0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502
2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655
3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
4 0x00007ffee6370552 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so
5 0x00007ffff77306db in start_thread (arg=0x7fffeef65700) at pthread_create.c:463
6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 2 (Thread 0x7fffefa86700 (LWP 10064)):
0 0x00007ffff6f835d7 in ioctl () at ../sysdeps/unix/syscall-template.S:78
1 0x00007ffff0882f28 in kmtIoctl () from /opt/rocm/lib/libhsakmt.so.1
2 0x00007ffff087d36f in hsaKmtWaitOnMultipleEvents () from /opt/rocm/lib/libhsakmt.so.1
3 0x00007ffff0af2fd3 in core::Signal::WaitAny(unsigned int, hsa_signal_s const, hsa_signal_condition_t const, long const, unsigned long, hsa_wait_state_t, long) ()
from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
4 0x00007ffff0adbdf6 in AMD::hsa_amd_signal_wait_any(unsigned int, hsa_signal_s, hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t, long) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
5 0x00007ffff0aeb48a in core::Runtime::AsyncEventsLoop(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
6 0x00007ffff0aae797 in os::ThreadTrampoline(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
7 0x00007ffff77306db in start_thread (arg=0x7fffefa86700) at pthread_create.c:463
8 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 1 (Thread 0x7ffff150ef00 (LWP 10058)):
0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205
1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111
2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181
3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so
8 0x0000555555559327 in main (argc=, argv=) at myopencl.c:9519
Here is the ouput of clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3019.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Vega 10 XT [Radeon RX Vega 64]
Device Topology: PCI[ B#13, D#0, F#0 ]
Max compute units: 64
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 1630Mhz
Address bits: 64
Max memory allocation: 7287183769
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 26751
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 64
Cache size: 16384
Global memory size: 8573157376
Constant buffer size: 7287183769
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 2992216473
Max global variable size: 7287183769
Max global variable preferred total size: 8573157376
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 64
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7f450a801d50
Name: gfx900
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3019.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
rocminfo
ROCk module is loaded
linuxperia is member of video group
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
==========
HSA Agents
==========
Agent 1
Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Marketing Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4200
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32895880(0x1f5f388) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32895880(0x1f5f388) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A
Agent 2
Name: gfx900
Marketing Name: Vega 10 XT [Radeon RX Vega 64]
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26751(0x687f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1630
BDFID: 3328
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: FALSE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx900
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32 Done
Hi all.
I have a AMD Vega 64 GPU with the newest ROCm 2.10 Driver on a Ubuntu 18.04.03 Linux headless Server.
See clinfo and rocminfo at the bottom for more informations.
My OpenCL Programm freezes always at the clCreateCommandQueue.
I have run my Programm in GDB after compiling it with gcc and this is the ouput that show somehow a deadlock BUG in the ROCm OpenCL implementation!
Another more simple non multi threaded OpenCl Example Programms not compiled with pthreads gcc option runs fine on the same machine however.
So it is not the hardware the problem it has to be somehting with the compilation and the kernel and the software that causes the futex Run Time DeadLock Bug in a Programm.
Please Help me fix this Problem i spent the last days to figure out what the problem is without any success. Very frustrating this BUG!
Thanks in advance for Helping me solve this Problem !
Here is the ouput of clinfo