CHIP-SPV / chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.
Other
184 stars 29 forks source link

Wrong result for miniMDock #726

Open mathialakan opened 9 months ago

mathialakan commented 9 months ago

Hi

The HIP code of miniMDock (https://github.com/ORNL-PE/miniMDock/tree/sycl_dev ) is working perfectly in AMD systems, but on Intel-PVC systems, it builds successfully but giving wrong results. The results obtained for the 7cpa test case ( -lfile ./input/7cpa/7cpa_ligand.pdbqt -nrun 10) on Arognne's sunspot using iprof is attached here, chipstar_iprof_sunspot.pdf

The correct result should be look like this, 7cpa_results.pdf

The code is using warp-level shuffle intrinsics for reduction (device/hip/kernels.cpp) and the code is able to setup warp/wavefront/subgroup size based on the running-system. Would you please help me to resolve the issues related to this build.

Thanks Mathi

pjaaskel commented 9 months ago

My guess is the masked __shfl() which, I think, is currently not supported by chipStar.

linehill commented 9 months ago

chipStar’s warp size is 32 by default. It does not seem that miniMDock’s HIP code accounts this size. For example, the WARPMINIMUMEXCHANGE call here will definitely compute out-of-bounds warp indices for __shfl() calls here which is probably undefined behavior.

If the Intel-PVC supports 64 wide subgroups in Level Zero or OpenCL, you could try changing chipStar’s warp size to it with -DDEFAULT_WARP_SIZE=64 configure option.

FYI, I had trouble compiling miniMDock for chipStar because of __any() calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our __any() implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with 1 helped me to get past the issue and the changes do not seem to affect the result on ROCm.

mathialakan commented 9 months ago

Thanks @pjaaskel - so any alternative way to replace __shfl()?

Thanks @linehill for pointing-out the issue with WARPMINIMUMEXCHANGE, I changed the WARPMINIMUM2 micro to dynamically work for the warp size based on the building system.

pjaaskel commented 9 months ago

FYI, I had trouble compiling miniMDock for chipStar because of __any() calls because they trigger an assertion regarding a SPIR-V version in llvm-spirv. My guess is that our __any() implementation requires a SPIR-V version higher than what is being generated now (v1.2). Replacing the calls with 1 helped me to get past the issue and the changes do not seem to affect the result on ROCm.

I think it calls ballot which is not supported in v1.2, but is supported by Intel's driver regardless. I've patched that in the chipStar LLVM-SPIRV branch so it generates it regardless and hopes for the best: https://github.com/CHIP-SPV/SPIRV-LLVM-Translator/commit/0d66986d102f782f5d91c22a1f8e50e0bf2cfbe8

@mathialakan for the masked shfl, currently there is no easy workaround, but just try to reorganize the client code to use uniform shuffles.

pvelesko commented 3 months ago

@pjaaskel should we close this and/or open a new issue with for implementing masked shfl support?