ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.71k stars 528 forks source link

Differences in throughput for an application in HIP/SYCL #3441

Closed jinz2014 closed 5 months ago

jinz2014 commented 5 months ago

The attached paper shows that the throughput of the application in SYCL is higher than that of the HIP program, but it does not explain the performance difference.

Unlocking performance portability on LUMI-G supercomputer: A virtual screening case study 3648115.3648125.pdf

bdenhollander commented 5 months ago

5.1.1 Software stack. [...] Moreover, we used the HIPIFY tool 4 to automatically generate a HIP implementation from the CUDA one, based on HIP 5.3. We have used the ROCm LLVM’s to perform a code build of the HIP version on AMD GPUs

5.2 Single GPU performance portability [...] Moreover, we include an automatically generated HIP version for AMD GPUs, while for NVIDIA GPUs, we include a hand-optimized CUDA version.

It doesn't sound like they made any effort to tune the generated HIP version. CUDA results for A100 are double that of AdaptiveCPP so there's a good chance that hand-optimized HIP could also outperform SYCL.