AMReX-Codes / amrex

AMReX: Software Framework for Block Structured AMR
https://amrex-codes.github.io/amrex
Other
553 stars 352 forks source link

Cannot build with CUDA and profiling on CUDA versions >= 12.5 #4153

Closed rho-novatron closed 2 months ago

rho-novatron commented 2 months ago

I'm trying to build on a fresh Ubuntu 24.04 install with requirements installed with conda. When installing cuda 12.5 or later, it cannot even pass cmake configuring, seemingly due to the change to header-based nvTools instead of library-based in nvtx3.

There is some info here on how to use nvtx with cmake these days.

Here's the error message I get:

> cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA

...

CMake Error at Tools/CMake/AMReXParallelBackends.cmake:71 (target_link_libraries):
  Target "amrex_3d" links to:

    CUDA::nvToolsExt

  but the target was not found.  Possible reasons include:

    * There is a typo in the target name.
    * A find_package call is missing for an IMPORTED target.
    * An ALIAS target is missing.

Call Stack (most recent call first):
  Src/CMakeLists.txt:40 (include)

-- Generating done (0.0s)
CMake Generate step failed.  Build files cannot be regenerated correctly.

Disabling profilers makes the build work:

> cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA  -DAMReX_BASE_PROFILE=OFF -DAMReX_TINY_PROFILE=OFF

...

-- Configuring done (0.6s)
-- Generating done (0.0s)
-- Build files have been written to: /home/rho/git/amrex/build

> cmake --build build -j 16

...

[ 99%] Building CUDA object Src/CMakeFiles/amrex_3d.dir/Particle/AMReX_ParticleContainerBase.cpp.o
[100%] Linking CUDA static library libamrex_3d.a
[100%] Built target amrex_3d
zingale commented 2 months ago

We have the same issue with gmake -- CUDA 12.6 changed some headers that make it incompatible with the profiling.

WeiqunZhang commented 2 months ago

I thought the issue has been resolved in https://github.com/AMReX-Codes/amrex/pull/4064 and it should be in 24.09. Which version of amrex are you using?

@zingale Do you still have the issue with the development branch?

rho-novatron commented 2 months ago

This was on the current development branch as of today, commit 97fcea36cbb4c0c69ff002a34ffa6520e858b6f7 .

rho-novatron commented 2 months ago

4064 seems to have no changes to any CMakeLists, so it might have fixed it for gmake, but not cmake. If I'm looking at the correct documentation (this), it seems as if the headers need to be downloaded outside of cmake (or checked in, I guess) or fetched using the CMake Package Manager.

zingale commented 2 months ago

oh, I hadn't realized that PR was merged. Indeed, it links now with GNU make.

WeiqunZhang commented 2 months ago

I cannot reproduce the cmake issue. It somehow works for me. Maybe my cuda installation includes old files. @rho-novatron Does it work if you make the following change?

--- a/Tools/CMake/AMReXParallelBackends.cmake
+++ b/Tools/CMake/AMReXParallelBackends.cmake
@@ -68,7 +68,7 @@ if (  AMReX_GPU_BACKEND STREQUAL "CUDA"

         # nvToolsExt: if tiny profiler or base profiler are on.
         if (AMReX_TINY_PROFILE OR AMReX_BASE_PROFILE)
-            target_link_libraries(amrex_${D}d PUBLIC CUDA::nvToolsExt)
+            target_link_libraries(amrex_${D}d PUBLIC nvtx3-cpp)
         endif ()
    endforeach()
WeiqunZhang commented 2 months ago

Maybe it also depends on cmake version. I am using 3.30.3.

rho-novatron commented 2 months ago

Applying that patch does get me through the configuration and generation, but then building fails with

cmake --build build -j 16

...

[ 52%] Building CUDA object Src/CMakeFiles/amrex_3d.dir/Base/AMReX_GpuUtility.cpp.o
/home/rho/git/amrex/Src/Base/AMReX_GpuDevice.cpp:25:12: fatal error: nvToolsExt.h: No such file or directory
   25 | #  include <nvToolsExt.h>
      |            ^~~~~~~~~~~~~~
compilation terminated.

The only current conda package I have installed that provides nvToolsExt.h is nsight-compute, and that's not included by the current config. I'm trying to find a way to convince cmake to add that to the include path...

I'm also on cmake 3.30.3.

rho-novatron commented 2 months ago

I got it! Just installing the package nvidia::cuda-nvtx-dev makes the original current development work just fine. So, it was just my fault all the time, getting confused by the error messages. I'll try to get WarpX to add the cuda-nvtx-dev package to the list of requirements, and then this should be fine as is. Closing the issue.