ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.55k stars 519 forks source link

[Issue]: COMGR failed to get code object ISA name #3436

Closed mariodirenzo closed 3 months ago

mariodirenzo commented 3 months ago

Problem Description

Hello, I am trying to port my code to HIP to use AMD GPUs. After minor modifications, I can compile the code with HIP 5.7.1 using -fgpu-rdc. However, when I execute some simple tests, it seems that none of the GPU kernels are executed. The executable does not throw a specific error but even launching a simple kernel like

__global__ void run_printf() { printf("Hello World\n"); }

does not produce any output.

If I execute the code with AMD_LOG_LEVEL=1, I see the following error

:1:devprogram.cpp           :1874: 76368963088 us: [pid:1206355 tid:0x154cb1a7c680] Error: COMGR failed to get code object ISA name.
Error: create kernel metadata map using COMgr
Error: Cannot Find Global Var Sizes
Error: Cannot create kernels.

I was wondering if have any suggestions to investigate what is going on with my code and how to fix it.

Operating System

Red Hat Enterprise Linux 8.9 (Ootpa)

CPU

AMD EPYC 7A53 64-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 5.7.1

ROCm Component

HIP, hipCUB, hipFFT

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module is loaded

HSA System Attributes

Runtime Version: 1.1 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE System Endianness: LITTLE Mwaitx: DISABLED DMAbuf Support: YES

========== HSA Agents


Agent 1


Name: AMD EPYC 7A53 64-Core Processor Uuid: CPU-XX Marketing Name: AMD EPYC 7A53 64-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 0 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3541 BDFID: 0 Internal Node ID: 0 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 131572544(0x7d7a340) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 131572544(0x7d7a340) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 131572544(0x7d7a340) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:


Agent 2


Name: AMD EPYC 7A53 64-Core Processor Uuid: CPU-XX Marketing Name: AMD EPYC 7A53 64-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 1 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3541 BDFID: 0 Internal Node ID: 1 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:


Agent 3


Name: AMD EPYC 7A53 64-Core Processor Uuid: CPU-XX Marketing Name: AMD EPYC 7A53 64-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 2 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3541 BDFID: 0 Internal Node ID: 2 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 132111236(0x7dfdb84) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:


Agent 4


Name: AMD EPYC 7A53 64-Core Processor Uuid: CPU-XX Marketing Name: AMD EPYC 7A53 64-Core Processor Vendor Name: CPU Feature: None specified Profile: FULL_PROFILE Float Round Mode: NEAR Max Queue Number: 0(0x0) Queue Min Size: 0(0x0) Queue Max Size: 0(0x0) Queue Type: MULTI Node: 3 Device Type: CPU Cache Info: L1: 32768(0x8000) KB Chip ID: 0(0x0) ASIC Revision: 0(0x0) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 3541 BDFID: 0 Internal Node ID: 3 Compute Unit: 32 SIMDs per CU: 0 Shader Engines: 0 Shader Arrs. per Eng.: 0 WatchPts on Addr. Ranges:1 Features: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 132036012(0x7deb5ac) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 2 Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 132036012(0x7deb5ac) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE Pool 3 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 132036012(0x7deb5ac) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: TRUE ISA Info:


Agent 5


Name: gfx90a Uuid: GPU-7facaa47a33fa808 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 4 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 49408 Internal Node ID: 4 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 6


Name: gfx90a Uuid: GPU-66d905c29ef90dad Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 5 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 50688 Internal Node ID: 5 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 7


Name: gfx90a Uuid: GPU-0e6a3b0db4631bd7 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 6 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 51456 Internal Node ID: 6 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 8


Name: gfx90a Uuid: GPU-85853daf6c4a5232 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 7 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 52736 Internal Node ID: 7 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 9


Name: gfx90a Uuid: GPU-9acabd4df0fab5a9 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 8 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 53504 Internal Node ID: 8 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 10


Name: gfx90a Uuid: GPU-ea272b94b7c02dc3 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 9 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 54784 Internal Node ID: 9 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 11


Name: gfx90a Uuid: GPU-d2baea57cd168ab6 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 10 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 55552 Internal Node ID: 10 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32


Agent 12


Name: gfx90a Uuid: GPU-1dc5bc6a35de5e17 Marketing Name: AMD Instinct MI250X Vendor Name: AMD Feature: KERNEL_DISPATCH Profile: BASE_PROFILE Float Round Mode: NEAR Max Queue Number: 128(0x80) Queue Min Size: 64(0x40) Queue Max Size: 131072(0x20000) Queue Type: MULTI Node: 11 Device Type: GPU Cache Info: L1: 16(0x10) KB L2: 8192(0x2000) KB Chip ID: 29704(0x7408) ASIC Revision: 1(0x1) Cacheline Size: 64(0x40) Max Clock Freq. (MHz): 1700 BDFID: 56832 Internal Node ID: 11 Compute Unit: 110 SIMDs per CU: 4 Shader Engines: 8 Shader Arrs. per Eng.: 1 WatchPts on Addr. Ranges:4 Features: KERNEL_DISPATCH Fast F16 Operation: TRUE Wavefront Size: 64(0x40) Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Max Waves Per CU: 32(0x20) Max Work-item Per CU: 2048(0x800) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) Max fbarriers/Workgrp: 32 Packet Processor uCode:: 78 SDMA engine uCode:: 8 IOMMU Support:: None Pool Info: Pool 1 Segment: GLOBAL; FLAGS: COARSE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 2 Segment: GLOBAL; FLAGS: Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 3 Segment: GLOBAL; FLAGS: FINE GRAINED Size: 67092480(0x3ffc000) KB Allocatable: TRUE Alloc Granule: 4KB Alloc Alignment: 4KB Accessible by all: FALSE Pool 4 Segment: GROUP Size: 64(0x40) KB Allocatable: FALSE Alloc Granule: 0KB Alloc Alignment: 0KB Accessible by all: FALSE ISA Info: ISA 1 Name: amdgcn-amd-amdhsa--gfx90a:sramecc+:xnack- Machine Models: HSA_MACHINE_MODEL_LARGE Profiles: HSA_PROFILE_BASE Default Rounding Mode: NEAR Default Rounding Mode: NEAR Fast f16: TRUE Workgroup Max Size: 1024(0x400) Workgroup Max Size per Dimension: x 1024(0x400) y 1024(0x400) z 1024(0x400) Grid Max Size: 4294967295(0xffffffff) Grid Max Size per Dimension: x 4294967295(0xffffffff) y 4294967295(0xffffffff) z 4294967295(0xffffffff) FBarrier Max Size: 32 Done

Additional Information

No response

yxsamliu commented 3 months ago

what command did you use to compile and link the program?

mariodirenzo commented 3 months ago

The objects are compiled with lines (generated by Make) like

clang++ -DUSE_HDF -DUSE_HIP -D__HIP_PLATFORM_AMD__ -D__HIP_ROCclr__=1  [VARIOUS INCLUDE FOLDERS] -fgpu-rdc -O3 -Wall --offload-arch=gfx90a -fopenmp=libomp -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -std=gnu++17 -o   fft.cc.o -x hip -c fft.cc

The objects are collected into libraries with lines like

llvm-ar qc libprometeoSolver.a [OBJECT NAMES]

The program is linked with

hipcc_cmake_linker_helper /usr/bin -fgpu-rdc -O3 -Wall --offload-arch=gfx90a -Wl,--export-dynamic main.cc.o -o prometeo.exec   [ALL LIBRARIES]
yxsamliu commented 3 months ago

for a simple program like

#inclue "hip/hip_runtime.h"
__global__ void run_printf() { printf("Hello World\n"); }
int main() {
  run_printf<<<1,1>>>();
  hipDeviceSynchronize();
}

If you compile it directly with /opt/rocm/llvm/bin/clang++ --hip-link -O3 --offload-arch=gfx90a . Does it work?

mariodirenzo commented 3 months ago

yes, it does

yxsamliu commented 3 months ago

it could be due to some compiler bug about archive handling, which have been fixed in latest ROCm.

mariodirenzo commented 3 months ago

You are right. The problem goes away after switching to rocm 6.0.2. Thanks for your help