Open pvelesko opened 11 months ago
I was able to resolve the JIT failures by using an older runtime:
urrently Loaded Modules:
1) mpich/51.2/icc-all-pmix-gpu 3) cray-pals/1.2.12 5) prepend-deps/default 7) cmake/3.26.4 9) gdb/13.1 11) HIP/hipBLAS/chip-spv-latest 13) intel_compute_runtime/release/agama-devel-627
2) libfabric/1.15.2.0 4) cray-libpals/1.2.12 6) append-deps/default 8) gcc/12.1.0 10) HIP/chipStar/llvm15/latest/debug 12) clang/clang15-spirv-omp 14) oneapi/eng-compiler/2023.05.15.003
@pengtu would providing the SPIR-V suffice for the reproducer?
Yes
Peng
From: Paulius Velesko @.> Sent: Tuesday, August 22, 2023 3:05:48 AM To: CHIP-SPV/chipStar @.> Cc: Peng Tu @.>; Mention @.> Subject: Re: [CHIP-SPV/chipStar] libCEED JIT Failures (Issue #562)
@pengtuhttps://github.com/pengtu would providing the SPIR-V suffice for the reproducer?
— Reply to this email directly, view it on GitHubhttps://github.com/CHIP-SPV/chipStar/issues/562#issuecomment-1687889154, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATZITGDJ625FIIFNWKEU5TXWR77ZANCNFSM6AAAAAA2Z3VDH4. You are receiving this because you were mentioned.Message ID: @.***>
pvelesko@x1921c6s5b0n0:~/libCEED> ./build/t550-operator /gpu/hip/gen
Computed Area Coarse Grid: 0.000000 != True Area: 2.0
Computed Area Fine Grid: 0.000000 != True Area: 2.0
CHIP error [TID 5152] [1692797160.862680477] : hipErrorLaunchFailure (Failed to find kernel via kernel name: CeedKernelHipGenOperator_Scale) in /home/pvelesko/chipStar/main/src/CHIPBackend.cc:269:getKernelByName
CHIP error [TID 5152] [1692797160.866513927] : Caught Error: hipErrorLaunchFailure
/home/pvelesko/libCEED/backends/hip/ceed-hip-compile.cpp:125 in CeedGetKernelHip(): hipErrorLaunchFailure
Aborted (core dumped)
clinfo driver version: 23.17.26241.22
Attached are the two SPIR-V files that have CeedKernelHipGenOperator_Scale
in them.
Failing for runtime 647
, passing for runtime 627
but giving a correctness error that might be unrelated to the runtime.
@pengtu
@pengtu Can you confirm that you received the SPIR-V and it's sufficient?
@pengtu
Filed the issue to Intel compute runtime:
@pvelesko: do you have the module binary that can be shared with the GPU driver team. Please check the request in the tracking issue filed to compute-runtime above.
Using
/gpu/hip/gen
Level Zero Failures:
OpenCL in these cases either hangs:
or fails with the following error: