Closed jglaser closed 5 months ago
I was able to build the essential parts of the ROCm framework locally, including hcc
. One offending kernel with __launch_bounds__
now fails to compile , which is perhaps somewhat more helpful than not running at all (which happened for another kernel with __launch_bounds__
)
from the make output with hcc
LLVM ERROR: could not scavenge SGPR to spill in entry function
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0. Program arguments: /home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc -mtriple amdgcn-amd-amdhsa -mcpu=gfx906 -mattr=-code-object-v3 -O3 --frame-pointer=none -mattr=+sram-ecc -amdgpu-function-calls=0 -filetype=obj -o /home/users/coe0176/tmp/tmp.uBeLludxuR/kernel-gfx906.hsaco.isabin /home/users/coe0176/tmp/tmp.uBeLludxuR/kernel-gfx906.hsaco.opt.bc
1. Running pass 'CallGraph Pass Manager' on module '/home/users/coe0176/tmp/tmp.uBeLludxuR/kernel-gfx906.hsaco.opt.bc'.
2. Running pass 'Prologue/Epilogue Insertion & Frame Finalization' on function '@_ZN4hpmc3gpu6kernel26clusters_insert_depletantsINS_21ShapeConvexPolyhedronELj1024EEEvPK15HIP_vector_typeIdLj4EES7_PKjb4vec3IdE4quatIdES9_S9_7Index2DS4_IjLj3EES4_IdLj3EE7Index3DjjS9_SE_jj6BoxDimPKNT_10param_typeEjjjSE_PjPS4_IjLj2EEjjjS9_'
#0 0x000000000195302a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x195302a)
#1 0x0000000001950e64 llvm::sys::RunSignalHandlers() (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x1950e64)
#2 0x0000000001950f98 SignalHandler(int) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x1950f98)
#3 0x00002aaaaacde5d0 __restore_rt (/lib64/libpthread.so.0+0xf5d0)
#4 0x00002aaaab8592c7 raise (/lib64/libc.so.6+0x362c7)
#5 0x00002aaaab85a9b8 abort (/lib64/libc.so.6+0x379b8)
#6 0x00000000018d995f llvm::report_fatal_error(llvm::Twine const&, bool) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x18d995f)
#7 0x00000000018d9a68 (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x18d9a68)
#8 0x000000000083010a (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x83010a)
#9 0x0000000000831de3 llvm::SIRegisterInfo::eliminateFrameIndex(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, int, unsigned int, llvm::RegScavenger*) const (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x831de3)
#10 0x0000000001060734 (anonymous namespace)::PEI::replaceFrameIndices(llvm::MachineBasicBlock*, llvm::MachineFunction&, int&) (.isra.248) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x1060734)
#11 0x0000000001067f88 (anonymous namespace)::PEI::runOnMachineFunction(llvm::MachineFunction&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x1067f88)
#12 0x0000000000f7811d llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (.part.45) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0xf7811d)
#13 0x00000000012dfc97 llvm::FPPassManager::runOnFunction(llvm::Function&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x12dfc97)
#14 0x0000000000c22b07 (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0xc22b07)
#15 0x00000000012e06b8 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x12e06b8)
#16 0x0000000000689241 compileModule(char**, llvm::LLVMContext&) (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x689241)
#17 0x000000000061ff22 main (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x61ff22)
#18 0x00002aaaab845495 __libc_start_main (/lib64/libc.so.6+0x22495)
#19 0x0000000000683403 _start (/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/llc+0x683403)
/home/users/coe0176/miniconda3/envs/hip-master-hcc/bin/clamp-device: line 262: 49930 Aborted $LLC -mtriple amdgcn-amd-amdhsa -mcpu=$AMDGPU_TARGET $CODE_OBJECT_FORMAT $HCC_OPT $KMOPTLLC -amdgpu-function-calls=$AMDGPU_FUNC_CALLS -filetype=obj -o $2.isabin $2.opt.bc
Generating AMD GCN kernel failed in llc for target: gfx906
$ hcc --version
HCC clang version 11.0.0 (/home/users/coe0176/hcc/llvm-project/clang b4512f1f1817d745b0dddc05ec09728eb2593183) (based on HCC 3.1.20162-df05d4f-b4512f1 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/users/coe0176/miniconda3/envs/hip-master-hcc/bin
@jglaser Sorry for the lack of response. Please try latest ROCm 6.0.2 (HIP 6.0.32831) to see if your issue still exists? If resolved, please close the ticket. Thanks.
I am porting a CUDA kernel with
__launch_bounds__
specifier to HIP. The idea is to template the kernel over a range of launch bounds, thus compiling several configurations of the kernel with different numbers of registers, and find the optimal one by auto-tuning.However, I am unable to make it work on AMD hardware. HIP changes the semantics of the second argument of the launch bounds attribute from blocks per multiprocessor to active warps per execution unit according to the documentation. https://github.com/ROCm-Developer-Tools/HIP/blob/master/docs/markdown/hip_kernel_language.md
However, looking at the implementation it looks like it's actually compatible with the CUDA one.
from
include/hip/hcc_detail/hip_runtime.h
:So far so good, so I am simply leaving my CUDA code unchanged and it compiles. My kernel, without launch bounds, supports a maximum block size of 256 according to
hipFuncGetAttributes()
,maxThreadsPerBlock
. With__launch_bounds__(1024)
that changes to 1024 as expected. However, I am unable to run the kernel at the maximum block size then, I get:Everything works as expected if I leave out the
__launch_bounds__
attribute and clamp the block size down to 256 (or whateverhipFuncGetAttributes()
returns).Any ideas? I can point you to the source code (HOOMD-blue branch) if you need to reproduce, but I am afraid I can't come with a single kernel that is complex enough to reproduce.