I'm having a performance issue while running the rocHPL benchmark, which was compiled with the ROCm hipcc CXX compiler.
I have compiled two versions of rocHPL, using two different compilers:
one version was compiled with the default GNU 7.5.0 CXX compiler,
and the other was compiled with Clang 15.0.0, which is hipcc taken from the rocm/5.3.3/hip/bin directory.
For the first version (compiled with GNU), the basic install.sh script was used, along with the --with-rocm option to specify the rocm/5.3.3 directory, from which rocBLAS was also used. Additionally, the --with-mpi option was used to specify my previously installed OpenMPI.
The second version (compiled with Clang) was compiled using a modified version of the install.sh script, with
-DCMAKE_CXX_COMPILER=hipcc
in the cmake_common_options variable.
The other options for the install.sh script were the same as the first version.
When I run my benchmarks on the AMD Radeon Instinct MI50 32G with the same HPL.dat configuration, I get different results.
The GFLOPS achieved with hipcc rocHPL is always lower than that achieved with gnu rocHPL.
For example, here are the performance results for different configurations of the N parameter using the same HPL.dat file:
Hello there!
I'm having a performance issue while running the rocHPL benchmark, which was compiled with the ROCm hipcc CXX compiler.
I have compiled two versions of rocHPL, using two different compilers: one version was compiled with the default GNU 7.5.0 CXX compiler, and the other was compiled with Clang 15.0.0, which is hipcc taken from the rocm/5.3.3/hip/bin directory.
For the first version (compiled with GNU), the basic install.sh script was used, along with the --with-rocm option to specify the rocm/5.3.3 directory, from which rocBLAS was also used. Additionally, the --with-mpi option was used to specify my previously installed OpenMPI.
The second version (compiled with Clang) was compiled using a modified version of the install.sh script, with
-DCMAKE_CXX_COMPILER=hipcc
in the cmake_common_options variable.The other options for the install.sh script were the same as the first version.
When I run my benchmarks on the AMD Radeon Instinct MI50 32G with the same HPL.dat configuration, I get different results.
The GFLOPS achieved with hipcc rocHPL is always lower than that achieved with gnu rocHPL.
For example, here are the performance results for different configurations of the N parameter using the same HPL.dat file:
What causes this to happen? How can I correctly compile rocHPL using hipcc and avoid performance issues during benchmarking?