ecmwf-ifs / ectrans

Global spherical harmonics transforms library underpinning the IFS
Apache License 2.0
17 stars 33 forks source link

[redgreengpu] Fixes for CCE 17 #107

Open reuterbal opened 3 months ago

reuterbal commented 3 months ago

These have been obtained from vendor collaboration and supposedly fix compilation issue with with ROCm 5.7.1, CCE 17.0.0, OMP and GPU aware MPI.

ROCm 6.1.1 seems to cause additional issues with the discovery of include paths for hipblas, hipfft and the rocm equivalents. A potential fix is this:

diff --git a/cmake/ectrans_find_hip.cmake b/cmake/ectrans_find_hip.cmake
index 90fca11..b4142f5 100644
--- a/cmake/ectrans_find_hip.cmake
+++ b/cmake/ectrans_find_hip.cmake
@@ -110,10 +110,13 @@ macro( ectrans_find_hip )
     if( HAVE_HIP )
       list( APPEND ECTRANS_GPU_HIP_LIBRARIES ${hipblas_LIBRARIES} ${hipfft_LIBRARIES})
       list( APPEND ECTRANS_GPU_HIP_LIBRARIES ${rocblas_LIBRARIES} ${rocfft_LIBRARIES})
+      list( APPEND ECTRANS_GPU_HIP_INCLUDES ${hipblas_INCLUDE_DIRS}/hipblas ${hipfft_INCLUDE_DIRS}/hipfft)
+      list( APPEND ECTRANS_GPU_HIP_INCLUDES ${rocblas_INCLUDE_DIRS}/rocblas ${rocfft_INCLUDE_DIRS}/rocfft)
     endif()
   endif()
   ecbuild_info("HIP libraries: ${ECTRANS_GPU_HIP_LIBRARIES}")
+  ecbuild_info("HIP includes: ${ECTRANS_GPU_HIP_INCLUDES}")
 endmacro()
 macro( ectrans_declare_hip_sources )
diff --git a/src/trans/gpu/CMakeLists.txt b/src/trans/gpu/CMakeLists.txt
index 77d4e24..ad4c254 100644
--- a/src/trans/gpu/CMakeLists.txt
+++ b/src/trans/gpu/CMakeLists.txt
@@ -91,6 +91,7 @@ foreach( prec dp sp )
                                $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/algor/interface>
                                $<INSTALL_INTERFACE:include/ectrans>
                                $<INSTALL_INTERFACE:include>
+                               ${ECTRANS_GPU_HIP_INCLUDES}
           PUBLIC_LIBS          parkind_${prec}
                                fiat
           PRIVATE_LIBS         ${ECTRANS_GPU_HIP_LIBRARIES}

However, I'm hoping there are proper library targets that could/should be used instead, and which needs to be examined once we have access to the newer software stack versions.

wdeconinck commented 3 months ago

Perhaps we can wait a bit for this after #106 is merged in develop? Or merge here and keep a mental note that this will also need another PR to develop in addition, as 'redgreengpu' will not get merged to develop.

reuterbal commented 3 months ago

Yes, I fully agree. In fact, we don't have any means of testing this ourselves, yet, and therefore might want to keep this open until confirmed functional. It would still be useful to include this into redgreengpu even though it is not going to find its way into the main branches, since it continues to be the fallback for Cray+AMD platforms.

wdeconinck commented 2 months ago

We can merge the source code change then. I am also okay for you to add the cmake changes, but perhaps with the name "ECTRANS_GPU_HIP_INCLUDE_DIRS" instead of "...INCLUDES". This could be added to this PR as well.