Open Sarbojit2019 opened 7 months ago
Could you trace where the __atomic_fetch_add_8
is coming from? The function you see is a compiler built-in and not a HIP language feature. It is possible that there is an user error involved or an unintended device code injection via a constexpr
function (constexpr
functions are implicitly __host__ __device__
in HIP/CUDA mode).
Attempted to reproduce the issue with:
#include <hip/hip_runtime.h>
__device__ int x = 0;
__global__ void k() { __atomic_fetch_add(&x, 1, __ATOMIC_RELAXED); }
But this gets blocked by an assertion:
$ ../install/bin/hipcc atomic-builtins.hip -c
atomic-builtins.hip:3:23: warning: large atomic operation may incur significant performance penalty; the access size (4 bytes) exceeds the max lock-free size (0 bytes) [-Watomic-alignment]
__global__ void k() { __atomic_fetch_add(&x, 1, __ATOMIC_RELAXED); }
^
clang-16: /mnt/md0/linehill/ws-chip-spv-3/llvm-project/llvm/lib/IR/Instructions.cpp:3356: static llvm::CastInst* llvm::CastInst::Create(llvm::Instruction::CastOps, llvm::Value*, llvm::Type*, const llvm::Twine&, llvm::Instruction*): Assertion `castIsValid(op, S, Ty) && "Invalid cast!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
...
There is a possibility that the QuickSilver app is compiled with clang without assertions enabled and the compiler produces ill-formed code.
@linehill How do I enable/disable assertion in compiler? I have built clang locally in release mode and using the same for compiling QuickSilver. Are you suspecting it is bad kernel which is creating the issue? As per the repo looks like same code works for AMD.
How do I enable/disable assertion in compiler?
The assertions are enabled with -DCMAKE_BUILD_TYPE=Debug.
Are you suspecting it is bad kernel which is creating the issue?
Not sure yet. There might be very unexpected way the __atomic_fetch_add_8
gets introduced into the device code. Tracing the source of it could help here.
@Sarbojit2019, any success tracing the origin of the __atomic_fetch_add_8
? I made a sanity check patch that may help on the tracing. Pull the patch and rebuild chipStar in Debug mode to enable the sanity checker.
@Sarbojit2019 status?
While running the QuickSilver app with Levelzero as backed I see a crash. Looks like "__atomic_fetch_add_8" support is missing.
QuickSilver : https://github.com/oneapi-src/Velocity-Bench/tree/main/QuickSilver
Error message: CHIP warning [TID 1675333] [1700634335.412708246] : Missing definition for '__atomic_fetch_add_8' CHIP warning [TID 1675333] [1700634335.412844660] : SPIR-V Parser: MemberId 82 not found in type map CHIP warning [TID 1675333] [1700634335.412847436] : SPIR-V Parser: MemberId 82 not found in type map CHIP warning [TID 1675333] [1700634335.412849475] : SPIR-V Parser: MemberId 82 not found in type map CHIP warning [TID 1675333] [1700634335.412873560] : SPIR-V Parser: MemberId 192 not found in type map CHIP warning [TID 1675333] [1700634335.412879170] : SPIR-V Parser: MemberId 192 not found in type map CHIP warning [TID 1675333] [1700634335.412882545] : SPIR-V Parser: MemberId 194 not found in type map CHIP error [TID 1675333] [1700634336.321535904] : hipErrorTbd (ZE_RESULT_ERROR_INVALID_MODULE_UNLINKED ) in /home/sarbojit/src/chipStar/src/backend/Level0/CHIPBackendLevel0.cc:2482:compile
CHIP error [TID 1675333] [1700634336.321705150] : Caught Error: hipErrorTbd error: #1054 (hipErrorTbd Aborted (core dumped)