ROCm / ROCm-OpenCL-Runtime

ROCm OpenOpenCL Runtime
168 stars 55 forks source link

ROCm OpenCL freezes on Linux when calling clCreateCommandQueue #93

Open LinuXperia opened 4 years ago

LinuXperia commented 4 years ago

Hi all.

I have a AMD Vega 64 GPU with the newest ROCm 2.10 Driver on a Ubuntu 18.04.03 Linux headless Server.

uname -a Linux 4.15.0-70-generic #79-Ubuntu SMP Tue Nov 12 10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

See clinfo and rocminfo at the bottom for more informations.

My OpenCL Programm freezes always at the clCreateCommandQueue.

I have run my Programm in GDB after compiling it with gcc and this is the ouput that show somehow a deadlock BUG in the ROCm OpenCL implementation!

Another more simple non multi threaded OpenCl Example Programms not compiled with pthreads gcc option runs fine on the same machine however.

So it is not the hardware the problem it has to be somehting with the compilation and the kernel and the software that causes the futex Run Time DeadLock Bug in a Programm.

Please Help me fix this Problem i spent the last days to figure out what the problem is without any success. Very frustrating this BUG!

Thanks in advance for Helping me solve this Problem !

Thread 1 "myopencl" received signal SIGINT, Interrupt. 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 205 ../sysdeps/unix/sysv/linux/futex-internal.h: No such file or directory.

(gdb) bt

0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111

2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181

3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

8 0x0000555555559327 in main (argc=, argv=) at myopencl.c:9519

(gdb) thread apply all bt

Thread 10 (Thread 0x7ffedbfff700 (LWP 10072)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7ffedbfff700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7ffee0ffe700 (LWP 10071)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7ffee0ffe700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7ffee17ff700 (LWP 10070)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad14) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7ffee17ff700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7fffecf61700 (LWP 10069)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7fffecf61700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fffed762700 (LWP 10068)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7fffed762700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fffedf63700 (LWP 10067)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

---Type to continue, or q to quit---

5 0x00007ffff77306db in start_thread (arg=0x7fffedf63700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fffee764700 (LWP 10066)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6372542 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7fffee764700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fffeef65700 (LWP 10065)):

0 0x00007ffff77369f3 in futex_wait_cancelable (private=, expected=0, futex_word=0x7ffee7dfad10) at ../sysdeps/unix/sysv/linux/futex-internal.h:88

1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffee7dfacc0, cond=0x7ffee7dface8) at pthread_cond_wait.c:502

2 __pthread_cond_wait (cond=0x7ffee7dface8, mutex=0x7ffee7dfacc0) at pthread_cond_wait.c:655

3 0x00007ffee63c6072 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

4 0x00007ffee6370552 in ?? () from /opt/rocm/opencl/lib/x86_64/libamd_comgr.so

5 0x00007ffff77306db in start_thread (arg=0x7fffeef65700) at pthread_create.c:463

6 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7fffefa86700 (LWP 10064)):

0 0x00007ffff6f835d7 in ioctl () at ../sysdeps/unix/syscall-template.S:78

1 0x00007ffff0882f28 in kmtIoctl () from /opt/rocm/lib/libhsakmt.so.1

2 0x00007ffff087d36f in hsaKmtWaitOnMultipleEvents () from /opt/rocm/lib/libhsakmt.so.1

3 0x00007ffff0af2fd3 in core::Signal::WaitAny(unsigned int, hsa_signal_s const, hsa_signal_condition_t const, long const, unsigned long, hsa_wait_state_t, long) ()

from /opt/rocm/hsa/lib/libhsa-runtime64.so.1

4 0x00007ffff0adbdf6 in AMD::hsa_amd_signal_wait_any(unsigned int, hsa_signal_s, hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t, long) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1

5 0x00007ffff0aeb48a in core::Runtime::AsyncEventsLoop(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1

6 0x00007ffff0aae797 in os::ThreadTrampoline(void*) () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1

7 0x00007ffff77306db in start_thread (arg=0x7fffefa86700) at pthread_create.c:463

8 0x00007ffff6f8e88f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff150ef00 (LWP 10058)):

0 0x00007ffff77396d6 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x555555989318) at ../sysdeps/unix/sysv/linux/futex-internal.h:205

1 do_futex_wait (sem=sem@entry=0x555555989318, abstime=0x0) at sem_waitcommon.c:111

2 0x00007ffff77397c8 in __new_sem_wait_slow (sem=0x555555989318, abstime=0x0) at sem_waitcommon.c:181

3 0x00007ffff0e44a70 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

4 0x00007ffff0e448b9 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

5 0x00007ffff0e57d23 in ?? () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

6 0x00007ffff0e291d9 in clCreateCommandQueueWithProperties () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

7 0x00007ffff0e29469 in clCreateCommandQueue () from /opt/rocm/opencl/lib/x86_64/libamdocl64.so

8 0x0000555555559327 in main (argc=, argv=) at myopencl.c:9519

Here is the ouput of clinfo

Number of platforms: 1 Platform Profile: FULL_PROFILE Platform Version: OpenCL 2.1 AMD-APP (3019.0) Platform Name: AMD Accelerated Parallel Processing Platform Vendor: Advanced Micro Devices, Inc. Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices

Platform Name: AMD Accelerated Parallel Processing Number of devices: 1 Device Type: CL_DEVICE_TYPE_GPU Vendor ID: 1002h Board name: Vega 10 XT [Radeon RX Vega 64] Device Topology: PCI[ B#13, D#0, F#0 ] Max compute units: 64 Max work items dimensions: 3 Max work items[0]: 1024 Max work items[1]: 1024 Max work items[2]: 1024 Max work group size: 256 Preferred vector width char: 4 Preferred vector width short: 2 Preferred vector width int: 1 Preferred vector width long: 1 Preferred vector width float: 1 Preferred vector width double: 1 Native vector width char: 4 Native vector width short: 2 Native vector width int: 1 Native vector width long: 1 Native vector width float: 1 Native vector width double: 1 Max clock frequency: 1630Mhz Address bits: 64 Max memory allocation: 7287183769 Image support: Yes Max number of images read arguments: 128 Max number of images write arguments: 8 Max image 2D width: 16384 Max image 2D height: 16384 Max image 3D width: 2048 Max image 3D height: 2048 Max image 3D depth: 2048 Max samplers within kernel: 26751 Max size of kernel argument: 1024 Alignment (bits) of base address: 1024 Minimum alignment (bytes) for any datatype: 128 Single precision floating point capability Denorms: Yes Quiet NaNs: Yes Round to nearest even: Yes Round to zero: Yes Round to +ve and infinity: Yes IEEE754-2008 fused multiply-add: Yes Cache type: Read/Write Cache line size: 64 Cache size: 16384 Global memory size: 8573157376 Constant buffer size: 7287183769 Max number of constant args: 8 Local memory type: Scratchpad Local memory size: 65536 Max pipe arguments: 16 Max pipe active reservations: 16 Max pipe packet size: 2992216473 Max global variable size: 7287183769 Max global variable preferred total size: 8573157376 Max read/write image args: 64 Max on device events: 1024 Queue on device max size: 8388608 Max on device queues: 1 Queue on device preferred size: 262144 SVM capabilities: Coarse grain buffer: Yes Fine grain buffer: Yes Fine grain system: No Atomics: No Preferred platform atomic alignment: 0 Preferred global atomic alignment: 0 Preferred local atomic alignment: 0 Kernel Preferred work group size multiple: 64 Error correction support: 0 Unified memory for Host and Device: 0 Profiling timer resolution: 1 Device endianess: Little Available: Yes Compiler available: Yes Execution capabilities: Execute OpenCL kernels: Yes Execute native function: No Queue on Host properties: Out-of-Order: No Profiling : Yes Queue on Device properties: Out-of-Order: Yes Profiling : Yes Platform ID: 0x7f450a801d50 Name: gfx900 Vendor: Advanced Micro Devices, Inc. Device OpenCL C version: OpenCL C 2.0 Driver version: 3019.0 (HSA1.1,LC) Profile: FULL_PROFILE Version: OpenCL 2.0 Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

rocminfo ROCk module is loaded linuxperia is member of video group =====================
HSA System Attributes
=====================
Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE

==========
HSA Agents
==========


Agent 1


Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Marketing Name: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4200
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 32895880(0x1f5f388) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32895880(0x1f5f388) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: TRUE
ISA Info:
N/A


Agent 2


Name: gfx900
Marketing Name: Vega 10 XT [Radeon RX Vega 64]
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 4096(0x1000)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
Chip ID: 26751(0x687f)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1630
BDFID: 3328
Internal Node ID: 1
Compute Unit: 64
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH Fast F16 Operation: FALSE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Acessible by all: FALSE
Pool 2
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Acessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx900
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done

Leo7654 commented 2 years ago

+1