Open mathialakan opened 9 months ago
My guess is the masked __shfl() which, I think, is currently not supported by chipStar.
chipStar’s warp size is 32 by default. It does not seem that miniMDock’s HIP code accounts this size. For example, the WARPMINIMUMEXCHANGE call here will definitely compute out-of-bounds warp indices for __shfl()
calls here which is probably undefined behavior.
If the Intel-PVC supports 64 wide subgroups in Level Zero or OpenCL, you could try changing chipStar’s warp size to it with -DDEFAULT_WARP_SIZE=64
configure option.
FYI, I had trouble compiling miniMDock for chipStar because of __any()
calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our __any()
implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with 1
helped me to get past the issue and the changes do not seem to affect the result on ROCm.
Thanks @pjaaskel - so any alternative way to replace __shfl()?
Thanks @linehill for pointing-out the issue with WARPMINIMUMEXCHANGE, I changed the WARPMINIMUM2 micro to dynamically work for the warp size based on the building system.
FYI, I had trouble compiling miniMDock for chipStar because of
__any()
calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our__any()
implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with1
helped me to get past the issue and the changes do not seem to affect the result on ROCm.
I think it calls ballot which is not supported in v1.2, but is supported by Intel's driver regardless. I've patched that in the chipStar LLVM-SPIRV branch so it generates it regardless and hopes for the best: https://github.com/CHIP-SPV/SPIRV-LLVM-Translator/commit/0d66986d102f782f5d91c22a1f8e50e0bf2cfbe8
@mathialakan for the masked shfl, currently there is no easy workaround, but just try to reorganize the client code to use uniform shuffles.
@pjaaskel should we close this and/or open a new issue with for implementing masked shfl support?
Hi
The HIP code of miniMDock (https://github.com/ORNL-PE/miniMDock/tree/sycl_dev ) is working perfectly in AMD systems, but on Intel-PVC systems, it builds successfully but giving wrong results. The results obtained for the 7cpa test case ( -lfile ./input/7cpa/7cpa_ligand.pdbqt -nrun 10) on Arognne's sunspot using iprof is attached here, chipstar_iprof_sunspot.pdf
The correct result should be look like this, 7cpa_results.pdf
The code is using warp-level shuffle intrinsics for reduction (device/hip/kernels.cpp) and the code is able to setup warp/wavefront/subgroup size based on the running-system. Would you please help me to resolve the issues related to this build.
Thanks Mathi