ROCm / rocm-examples

A collection of examples for the ROCm software stack
MIT License
159 stars 40 forks source link

[Issue][Build]: Failed to build with `error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+` #110

Closed Gardene-el closed 6 months ago

Gardene-el commented 6 months ago

Problem Description

Build steps and log: ❯ git clone https://github.com/ROCm/rocm-examples.git

Cloning into 'rocm-examples'...
remote: Enumerating objects: 8620, done.
remote: Counting objects: 100% (1799/1799), done.
remote: Compressing objects: 100% (517/517), done.
remote: Total 8620 (delta 1470), reused 1472 (delta 1277), pack-reused 6821
Receiving objects: 100% (8620/8620), 1.82 MiB | 2.22 MiB/s, done.
Resolving deltas: 100% (7425/7425), done.

❯ cd rocm-examples ❯ cmake -S . -B build

-- The CXX compiler identification is GNU 13.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- The HIP compiler identification is Clang 17.0.0
-- Detecting HIP compiler ABI info
-- Detecting HIP compiler ABI info - done
-- Check for working HIP compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting HIP compile features
-- Detecting HIP compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- GPU_ARCHITECTURES: gfx1034
-- GPU_ARCHITECTURES: gfx1034
-- Found Perl: /usr/bin/perl (found version "5.38.2")
-- Found Vulkan: /lib/libvulkan.so (found version "1.3.279") found components: glslangValidator glslc
-- Configuring done (4.1s)
-- Generating done (0.1s)
-- Build files have been written to: /home/cr0c0dile/Documents/rocm-examples/build

❯ cmake --build build

[  0%] Building HIP object Applications/bitonic_sort/CMakeFiles/applications_bitonic_sort.dir/main.hip.o
[  1%] Linking HIP executable applications_bitonic_sort
[  1%] Built target applications_bitonic_sort
[  1%] Building HIP object Applications/convolution/CMakeFiles/applications_convolution.dir/main.hip.o
[  2%] Linking HIP executable applications_convolution
[  2%] Built target applications_convolution
[  2%] Building HIP object Applications/floyd_warshall/CMakeFiles/applications_floyd_warshall.dir/main.hip.o
[  3%] Linking HIP executable applications_floyd_warshall
[  3%] Built target applications_floyd_warshall
[  3%] Building HIP object Applications/histogram/CMakeFiles/applications_histogram.dir/main.hip.o
[  4%] Linking HIP executable applications_histogram
[  4%] Built target applications_histogram
[  4%] Building HIP object Applications/monte_carlo_pi/CMakeFiles/applications_monte_carlo_pi.dir/main.hip.o
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr0 = V_MOV_B32_dpp $vgpr0(tied-def 0), killed $vgpr1, 322, 15, 15, 0, implicit $exec
1 error generated when compiling for gfx1034.
make[2]: *** [Applications/monte_carlo_pi/CMakeFiles/applications_monte_carlo_pi.dir/build.make:75: Applications/monte_carlo_pi/CMakeFiles/applications_monte_carlo_pi.dir/main.hip.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2095: Applications/monte_carlo_pi/CMakeFiles/applications_monte_carlo_pi.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

Operating System

Arch Linux

CPU

AMD Ryzen 5 5600 6-Core Processor

GPU

AMD Radeon VII

ROCm Version

ROCm 6.0.0

ROCm Component

No response

Steps to Reproduce

git clone https://github.com/ROCm/rocm-examples.git cd rocm-examples cmake -S . -B build cmake --build build

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

❯ rocminfo --support
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 5 5600 6-Core Processor  
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 5 5600 6-Core Processor  
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3500                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16296360(0xf8a9a8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16296360(0xf8a9a8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16296360(0xf8a9a8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1034                            
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 6500 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
    L3:                      16384(0x4000) KB                   
  Chip ID:                 29759(0x743f)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2975                               
  BDFID:                   2560                               
  Internal Node ID:        1                                  
  Compute Unit:            16                                 
  SIMDs per CU:            2                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 116                                
  SDMA engine uCode::      34                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4177920(0x3fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    4177920(0x3fc000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1034         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***             

Additional Information

Sorry that the above gpu list doesn't show my gpu, i have to pick a fake one instead. Below infomation is about my real gpu. ❯ echo "GPU:" && /opt/rocm/bin/rocminfo | grep -E "^\s*(Name|Marketing Name)"; GPU: Name: AMD Ryzen 5 5600 6-Core Processor
Marketing Name: AMD Ryzen 5 5600 6-Core Processor
Name: gfx1034
Marketing Name: AMD Radeon RX 6500 XT
Name: amdgcn-amd-amdhsa--gfx1034

Snektron commented 6 months ago

This is caused by hipcub::DeviceReduce::Sum, which calls into rocprim on AMD. It was previously caused by macro that wasn't properly defined for these architectures, and has been fixed for 6.0 or 6.1. See https://github.com/ROCm/rocPRIM/issues/452. Could you try with an updated rocPRIM installation? The easiest way to do that is to use the rocm/rocm-terminal:6.1 docker image.

Gardene-el commented 6 months ago

I tried to use the rocm/rocm-terminal and avoid the error: Illegal instruction detected, but still in the same project, i got another error, which i guess also a wrapper problem.

make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/monte_carlo_pi'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common -isystem /opt/rocm/include -isystem /opt/rocm/include -I ../../Common -D__HIP_PLATFORM_AMD__  -L /opt/rocm/lib  -o applications_monte_carlo_pi main.hip -lhiprand 
In file included from main.hip:25:
../../Common/hiprand_utils.hpp:28:10: fatal error: 'hiprand/hiprand.h' file not found
   28 | #include <hiprand/hiprand.h>
      |          ^~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx1034.

The whole log:

❯ sudo docker pull rocm/rocm-terminal
sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video rocm/rocm-terminal
[sudo] password for cr0c0dile: 
Using default tag: latest
latest: Pulling from rocm/rocm-terminal
17d0386c2fff: Pull complete 
268c8493c252: Pull complete 
009ab54bcd2e: Pull complete 
944182d77231: Pull complete 
4f4fb700ef54: Pull complete 
Digest: sha256:803f6250c28d86b3997cd1ef49af238c66e20276139f59d6740075e4915aa0cb
Status: Downloaded newer image for rocm/rocm-terminal:latest
docker.io/rocm/rocm-terminal:latest
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

rocm-user@4edf32dc05ea:~$ git clone https://github.com/ROCm/rocm-examples.git
Cloning into 'rocm-examples'...
remote: Enumerating objects: 8646, done.
remote: Counting objects: 100% (1834/1834), done.
remote: Compressing objects: 100% (546/546), done.
remote: Total 8646 (delta 1496), reused 1476 (delta 1283), pack-reused 6812
Receiving objects: 100% (8646/8646), 1.83 MiB | 2.77 MiB/s, done.
Resolving deltas: 100% (7446/7446), done.
rocm-user@4edf32dc05ea:~$  cd rocm-examples
rocm-user@4edf32dc05ea:~/rocm-examples$ cmake -S . -B build
CMake Error at CMakeLists.txt:23 (cmake_minimum_required):
  CMake 3.21.3 or higher is required.  You are running version 3.16.3

-- Configuring incomplete, errors occurred!
rocm-user@4edf32dc05ea:~/rocm-examples$ make
make -C Applications 
make[1]: Entering directory '/home/rocm-user/rocm-examples/Applications'
make -C bitonic_sort 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/bitonic_sort'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common    -o applications_bitonic_sort main.hip  
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/bitonic_sort'
make -C convolution 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/convolution'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common    -o applications_convolution main.hip  
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/convolution'
make -C floyd_warshall 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/floyd_warshall'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common    -o applications_floyd_warshall main.hip  
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/floyd_warshall'
make -C histogram 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/histogram'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common    -o applications_histogram main.hip  
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/histogram'
make -C prefix_sum 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/prefix_sum'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common    -o applications_prefix_sum main.hip  
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/prefix_sum'
make -C monte_carlo_pi 
make[2]: Entering directory '/home/rocm-user/rocm-examples/Applications/monte_carlo_pi'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common -isystem /opt/rocm/include -isystem /opt/rocm/include -I ../../Common -D__HIP_PLATFORM_AMD__  -L /opt/rocm/lib  -o applications_monte_carlo_pi main.hip -lhiprand 
In file included from main.hip:25:
../../Common/hiprand_utils.hpp:28:10: fatal error: 'hiprand/hiprand.h' file not found
   28 | #include <hiprand/hiprand.h>
      |          ^~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx1034.
make[2]: *** [Makefile:64: applications_monte_carlo_pi] Error 1
make[2]: Leaving directory '/home/rocm-user/rocm-examples/Applications/monte_carlo_pi'
make[1]: *** [Makefile:41: monte_carlo_pi] Error 2
make[1]: Leaving directory '/home/rocm-user/rocm-examples/Applications'
make: *** [Makefile:34: Applications] Error 2

For potential difference between Unix Makefile and CMake, I also got the error: Illegal instruction detected in the manually building with make:

❯ make
make -C Applications 
make[1]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications'
make -C bitonic_sort 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/bitonic_sort'
make[2]: 'applications_bitonic_sort' is up to date.
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/bitonic_sort'
make -C convolution 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/convolution'
make[2]: 'applications_convolution' is up to date.
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/convolution'
make -C floyd_warshall 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/floyd_warshall'
make[2]: 'applications_floyd_warshall' is up to date.
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/floyd_warshall'
make -C histogram 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/histogram'
make[2]: 'applications_histogram' is up to date.
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/histogram'
make -C prefix_sum 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/prefix_sum'
make[2]: 'applications_prefix_sum' is up to date.
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/prefix_sum'
make -C monte_carlo_pi 
make[2]: Entering directory '/home/cr0c0dile/Documents/rocm-examples/Applications/monte_carlo_pi'
/opt/rocm/bin/hipcc -std=c++17 -Wall -Wextra -I ../../Common -isystem /opt/rocm/include -isystem /opt/rocm/include -I ../../Common -D__HIP_PLATFORM_AMD__ -L /opt/rocm/lib -o applications_monte_carlo_pi main.hip -lhiprand
/opt/rocm/bin/rocm_agent_enumerator:95: SyntaxWarning: invalid escape sequence '\w'
  @staticVars(search_name=re.compile("gfx[0-9a-fA-F]+(:[-+:\w]+)?"))
/opt/rocm/bin/rocm_agent_enumerator:152: SyntaxWarning: invalid escape sequence '\A'
  line_search_term = re.compile("\A\s+Name:\s+(amdgcn-amd-amdhsa--gfx\d+)")
/opt/rocm/bin/rocm_agent_enumerator:154: SyntaxWarning: invalid escape sequence '\A'
  line_search_term = re.compile("\A\s+Name:\s+(gfx\d+)")
/opt/rocm/bin/rocm_agent_enumerator:175: SyntaxWarning: invalid escape sequence '\w'
  target_search_term = re.compile("1002:\w+")
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr1, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr1, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr1, 322, 15, 15, 0, implicit $exec
error: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+
renamable $vgpr3 = V_MOV_B32_dpp undef $vgpr3(tied-def 0), $vgpr1, 322, 15, 15, 0, implicit $exec
4 errors generated when compiling for gfx1034.
make[2]: *** [Makefile:64: applications_monte_carlo_pi] Error 1
make[2]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications/monte_carlo_pi'
make[1]: *** [Makefile:41: monte_carlo_pi] Error 2
make[1]: Leaving directory '/home/cr0c0dile/Documents/rocm-examples/Applications'
make: *** [Makefile:34: Applications] Error 2

for the installed rocprim version, it is 6.0 ❯ sudo pacman -Qi rocprim

Name            : rocprim
Version         : 6.0.2-1
Description     : Header-only library providing HIP parallel primitives
Architecture    : any
URL             : https://rocm.docs.amd.com/projects/rocPRIM/en/latest/index.html
Licenses        : MIT
Groups          : None
Provides        : None
Depends On      : rocm-core  hip
Optional Deps   : None
Required By     : hipcub  rocalution  rocm-hip-sdk  rocsparse  rocthrust
Optional For    : None
Conflicts With  : None
Replaces        : None
Installed Size  : 3.08 MiB
Packager        : Torsten Keßler <tpkessler@archlinux.org>
Build Date      : Mon 26 Feb 2024 03:00:29 PM CST
Install Date    : Fri 03 May 2024 05:13:04 AM CST
Install Reason  : Installed as a dependency for another package
Install Script  : No
Validated By    : Signature

for the versions of other rocm package, they are all 6.0

❯ sudo pacman -Qs roc
local/alsa-card-profiles 1:1.0.5-1
    Low-latency audio/video router and processor - ALSA card profiles
local/amd-ucode 20240409.1addd7dc-1
    Microcode update image for AMD CPUs
local/comgr 6.0.2-1
    Compiler support library for ROCm LLVM
local/ffmpeg 2:6.1.1-7
    Complete solution to record, convert and stream audio and video
local/graphite 1:1.3.14-3
    reimplementation of the SIL Graphite text processing engine
local/hip-runtime-amd 6.0.2-2
    Heterogeneous Interface for Portability ROCm
local/hipblas 6.0.2-1
    ROCm BLAS marshalling library
local/hipcub 6.0.2-1
    Header-only library on top of rocPRIM or CUB
local/hipfft 6.0.2-1
    rocFFT marshalling library.
local/hiprand 6.0.2-1
    rocRAND marshalling library
local/hipsolver 6.0.2-1
    rocSOLVER marshalling library.
local/hipsparse 6.0.2-1
    rocSPARSE marshalling library.
local/hsa-rocr 6.0.2-2
    HSA Runtime API and runtime for ROCm
local/hsakmt-roct 6.0.0-2
    Radeon Open Compute Thunk Interface
local/libpipewire 1:1.0.5-1
    Low-latency audio/video router and processor - client library
local/libvips 8.15.1-5
    A fast image processing library with low memory needs
local/libvpl 2.10.2-1
    Intel Video Processing Library
local/lsof 4.99.3-2
    Lists open files for running Unix processes
local/m4 1.4.19-3
    The GNU macro processor
local/pipewire 1:1.0.5-1
    Low-latency audio/video router and processor
local/pipewire-alsa 1:1.0.5-1
    Low-latency audio/video router and processor - ALSA configuration
local/pipewire-audio 1:1.0.5-1
    Low-latency audio/video router and processor - Audio support
local/pipewire-jack 1:1.0.5-1
    Low-latency audio/video router and processor - JACK replacement
local/pipewire-pulse 1:1.0.5-1
    Low-latency audio/video router and processor - PulseAudio replacement
local/procps-ng 4.0.4-3
    Utilities for monitoring your system and its processes
local/psmisc 23.7-1
    Miscellaneous procfs tools
local/rccl 6.0.2-1
    ROCm Communication Collectives Library
local/rocalution 6.0.2-2
    Next generation library for iterative sparse solvers for ROCm platform
local/rocblas 6.0.2-1
    Next generation BLAS implementation for ROCm platform
local/rocfft 6.0.2-1
    Next generation FFT implementation for ROCm
local/rocm-clang-ocl 6.0.2-1
    OpenCL compilation with clang compiler
local/rocm-cmake 6.0.2-1
    CMake modules for common build tasks needed for the ROCm software stack
local/rocm-core 6.0.2-2
    AMD ROCm core package (version files)
local/rocm-device-libs 6.0.2-1
    ROCm Device Libraries
local/rocm-hip-libraries 6.0.2-1
    Develop certain applications using HIP and libraries for AMD platforms
local/rocm-hip-runtime 6.0.2-1
    Packages to run HIP applications on the AMD platform
local/rocm-hip-sdk 6.0.2-1
    Develop applications using HIP and libraries for AMD platforms
local/rocm-language-runtime 6.0.2-1
    ROCm runtime
local/rocm-llvm 6.0.2-1
    Radeon Open Compute - LLVM toolchain (llvm, clang, lld)
local/rocm-opencl-runtime 6.0.2-1
    OpenCL implementation for AMD
local/rocm-opencl-sdk 6.0.2-1
    Develop OpenCL-based applications for AMD platforms
local/rocm-smi-lib 6.0.2-1
    ROCm System Management Interface Library
local/rocminfo 6.0.2-1
    ROCm Application for Reporting System Info
local/rocprim 6.0.2-1
    Header-only library providing HIP parallel primitives
local/rocrand 6.0.2-1
    Pseudo-random and quasi-random number generator on ROCm
local/rocsolver 6.0.2-1
    Subset of LAPACK functionality on the ROCm platform
local/rocsparse 6.0.2-2
    BLAS for sparse computation on top of ROCm
local/rocthrust 6.0.2-1
    Port of the Thrust parallel algorithm library atop HIP/ROCm
local/roctracer 6.0.2-1
    ROCm tracer library for performance tracing
local/spirv-tools 2023.6-1 (vulkan-devel)
    API and commands for processing SPIR-V modules
local/vapoursynth R66-2
    A video processing framework with the future in mind
local/webrtc-audio-processing-1 1.3-2
    AudioProcessing library based on Google's implementation of WebRTC

This is caused by hipcub::DeviceReduce::Sum, which calls into rocprim on AMD. It was previously caused by macro that wasn't properly defined for these architectures, and has been fixed for 6.0 or 6.1. See ROCm/rocPRIM#452. Could you try with an updated rocPRIM installation? The easiest way to do that is to use the rocm/rocm-terminal:6.1 docker image.

Snektron commented 6 months ago

I double checked, and it seems that the fix from https://github.com/ROCm/rocPRIM/issues/452 only made its way into ROCm 6.1, and not in 6.0. Unfortunately it seems that the Arch Linux ROCm package is not yet updated to 6.1, but it should work via docker.

The hiprand issue is caused by that hiprand and the other dependencies are not automatically installed in the rocm/rocm-terminal image. You need to install it manually with sudo apt update && sudo apt install hiprand rocrand hipcub rocprim. For your reference, here is the complete list of commands that I'm executing to get a build for your GPU:

$ docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal:6.1 bash
rocm-user@0b983450b199:~$ sudo apt update && sudo apt install hiprand rocrand hipcub rocprim
# [ ... lots of apt stuff ]
rocm-user@0b983450b199:~$ git clone https://github.com/ROCm/rocm-examples.git
Cloning into 'rocm-examples'...
# [ ... git stuff ]
rocm-user@0b983450b199:~$ cd rocm-examples/Applications/monte_carlo_pi/
rocm-user@0b983450b199:~/rocm-examples/Applications/monte_carlo_pi$ make CXXFLAGS=--offload-arch=gfx1034
rocm-user@0b983450b199:~/rocm-examples/Applications/monte_carlo_pi$

I need to explicitly pass CXXFLAGS=--offload-arch=gfx1034 because I don't actually have a RX 6500, but this should be automatically detected for your system if you just use make.

On a side note, the CMake command fails to configure because the Ubuntu image that rocm/rocm-terminal is based on does not come with a recent enough CMake version. In the dockerfiles included with the project, which are also based on rocm/rocm-terminal, we download a more recent version of CMake manually. I'd suggest you to just use these dockerfiles, but they aren't yet updated to 6.1.

Also, the original issue description lists that you have a Radeon VII, but the output from rocminfo --support lists that you have an RX 6500. I think something is not correct there...

Please let me know if this helps.

Gardene-el commented 6 months ago

Thanks a lot, this helps, the building is well after installed the packages needed. About the original description, i'm sorry for that, there is not a option for RX6500 and it is a required option. I've noted this in Additional Information.