ROCm / clr

MIT License
89 stars 47 forks source link

[Issue]: RCOM-6.1.3 crash of any OpenCL program (clinfo or custom) upon loading #93

Open artyom-beilis opened 3 weeks ago

artyom-beilis commented 3 weeks ago

Problem Description

Any opencl program crashes

OpenCL runtime

ii  rocm-opencl                                   2.0.0.60103-122~22.04                       amd64        clr built using CMake
ii  rocm-opencl-dbgsym                            2.0.0.60103-122~22.04                       amd64        debug symbols for rocm-opencl

Here backtrace from clinfo:

Thread 1 "clinfo" received signal SIGSEGV, Segmentation fault.
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:351
351     ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(gdb) bt
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:351
#1  0x00007fffe883e49c in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#2  0x00007fffe883f32d in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#3  0x00007fffe884578d in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#4  0x00007fffe8882bed in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#5  0x00007fffe8882ddc in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#6  0x00007fffe8856f5e in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#7  0x00007ffff6d39040 in roc::Device::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/device/rocm/rocdevice.cpp:476
#8  0x00007ffff6cf6bf5 in amd::Device::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/device/device.cpp:488
#9  0x00007ffff6d26b5e in amd::Runtime::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/platform/runtime.cpp:75
#10 0x00007ffff6cd1885 in ShouldLoadPlatform () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:224
#11 operator() (__closure=<optimized out>) at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:274
#12 std::__invoke_impl<void, clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#13 std::__invoke<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#14 operator() (__closure=<optimized out>) at /usr/include/c++/11/mutex:776
#15 operator() (__closure=<optimized out>) at /usr/include/c++/11/mutex:712
#16 _FUN () at /usr/include/c++/11/mutex:712
#17 0x00007ffff7c99ee8 in __pthread_once_slow (once_control=0x7ffff6df5740 <clIcdGetPlatformIDsKHR::initOnce>, init_routine=0x7ffff6edad50 <__once_proxy>) at ./nptl/pthread_once.c:116
#18 0x00007ffff6cd290f in __gthread_once (__func=<optimized out>, __once=0x7ffff6df5740 <clIcdGetPlatformIDsKHR::initOnce>) at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700
#19 std::call_once<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__once=..., __f=...) at /usr/include/c++/11/mutex:783
#20 clIcdGetPlatformIDsKHR (num_entries=<optimized out>, platforms=0x0, num_platforms=0x7fffffffe0e8) at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:274
#21 0x00007ffff7f97024 in ?? () from /lib/x86_64-linux-gnu/libOpenCL.so.1
#22 0x00007ffff7f97feb in clGetPlatformIDs () from /lib/x86_64-linux-gnu/libOpenCL.so.1
#23 0x000055555555b765 in ?? ()
#24 0x00007ffff7c29d90 in __libc_start_call_main (main=main@entry=0x55555555b5a0, argc=argc@entry=2, argv=argv@entry=0x7fffffffe3d8) at ../sysdeps/nptl/libc_start_call_main.h:58
#25 0x00007ffff7c29e40 in __libc

Here from other opencl program


#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:351
#1  0x00007fffe863e49c in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#2  0x00007fffe863f32d in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#3  0x00007fffe864578d in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#4  0x00007fffe8682bed in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#5  0x00007fffe8682ddc in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#6  0x00007fffe8656f5e in ?? () from /opt/rocm-6.1.3/lib/libhsa-runtime64.so.1
#7  0x00007ffff6b39040 in roc::Device::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/device/rocm/rocdevice.cpp:476
#8  0x00007ffff6af6bf5 in amd::Device::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/device/device.cpp:488
#9  0x00007ffff6b26b5e in amd::Runtime::init () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/rocclr/platform/runtime.cpp:75
#10 0x00007ffff6ad1885 in ShouldLoadPlatform () at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:224
#11 operator() (__closure=<optimized out>) at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:274
#12 std::__invoke_impl<void, clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#13 std::__invoke<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#14 operator() (__closure=<optimized out>) at /usr/include/c++/11/mutex:776
#15 operator() (__closure=<optimized out>) at /usr/include/c++/11/mutex:712
#16 _FUN () at /usr/include/c++/11/mutex:712
#17 0x00007ffff7899ee8 in __pthread_once_slow (once_control=0x7ffff6bf5740 <clIcdGetPlatformIDsKHR::initOnce>, init_routine=0x7ffff7cdad50 <__once_proxy>) at ./nptl/pthread_once.c:116
#18 0x00007ffff6ad290f in __gthread_once (__func=<optimized out>, __once=0x7ffff6bf5740 <clIcdGetPlatformIDsKHR::initOnce>) at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:700
#19 std::call_once<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__once=..., __f=...) at /usr/include/c++/11/mutex:783
#20 clIcdGetPlatformIDsKHR (num_entries=<optimized out>, platforms=0x0, num_platforms=0x7fffffffd8b8) at /long_pathname_so_that_rpms_can_package_the_debug_info/src/external/clr/opencl/amdocl/cl_icd.cpp:274
#21 0x00007ffff7ecf024 in ?? () from /lib/x86_64-linux-gnu/libOpenCL.so.1
#22 0x00007ffff7ecffeb in clGetPlatformIDs () from /lib/x86_64-linux-gnu/libOpenCL.so.1
#23 0x00007ffff7f1d969 in cl::Platform::get (platforms=0x7fffffffda20) at /usr/include/CL/opencl.hpp:2715
#24 dlprim::Context::select_opencl_device (this=0x7fffffffe010, p=0, d=0) at /home/noga/Projects/pytorch_dlprim/dlprimitives/src/context.cpp:152
#25 0x00007ffff7f1e10c in dlprim::Context::Context (this=this@entry=0x7fffffffe010, dev_id=...) at /home/noga/Projects/pytorch_dlprim/dlprimitives/src/context.cpp:56
#26 0x000055555555a907 in get_flops (device=..., scale=<optimized out>) at /home/noga/Projects/pytorch_dlprim/dlprimitives/tools/flops.cpp:811
#27 0x000055555555a027 in main (argc=<optimized out>, argv=0x7fffffffe358) at /home/noga/Projects/pytorch_dlprim/dlprimitives/tools/flops.cpp:935

While I understand that you don't support hip/pytorch and other stuff on for GCN4 - OpenCL is something that goes with official specs of the GPU and I expect a working OpenCL driver.

OS:
NAME="Ubuntu"
VERSION="22.04.4 LTS (Jammy Jellyfish)"

CPU: 
model name      : Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz

GPU:
  Name:                    Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Marketing Name:          Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Name:                    gfx803                             
  Marketing Name:          Radeon RX 560 Series               
      Name:                    amdgcn-amd-amdhsa--gfx803          

Operating System

Ubuntu 22.04 amd64

CPU

Intel i5 6600

GPU

AMD Radeon VII

ROCm Version

ROCm 6.1.0

ROCm Component

clr

Steps to Reproduce

Install opencl and run clinfo -l

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

$ /opt/rocm/bin/rocminfo --support
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.13
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Uuid:                    CPU-XX                             
  Marketing Name:          Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3900                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            4                                  
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16235816(0xf7bd28) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16235816(0xf7bd28) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16235816(0xf7bd28) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          Radeon RX 560 Series               
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
  Chip ID:                 26623(0x67ff)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   1196                               
  BDFID:                   256                                
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            4                                  
  Shader Engines:          2                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 730                                
  SDMA engine uCode::      58                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx803          
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

Additional Information

No response

artyom-beilis commented 3 weeks ago

Same with rocm 6.2

atamazov commented 2 weeks ago

@artyom-beilis AMD dropped supporting gfx8 officially from ROCm 4.0. Related ticket: https://github.com/ROCm/ROCm/issues/1265

artyom-beilis commented 2 weeks ago
  1. Still clinfo or any other program shouldn't crash
  2. There are still brand new GCN4 cards being sold and there is no official working OpenCL driver for Linux while there should be as per advertisement
zichguan-amd commented 1 week ago

Hi @artyom-beilis, we will look into this. Can you try running clinfo with AMD_LOG_LEVEL=4? e.g. AMD_LOG_LEVEL=4 clinfo. There is a similar issue ROCM/ROCM#3664 that may have the same root cause and is under investigation.

artyom-beilis commented 1 week ago

It is problematic, since I downgraded to ROCM-5.7.2 - at least I have working OpenCL rocm driver.

I'll see when I can do it. Anyway from the crash backtrace it looks the location is different