ECP-copa / Cabana

Performance-portable library for particle-based simulations
Other
188 stars 51 forks source link

Possibly unclosed Kokkos Profiling Region in Halo gather #736

Closed terwin closed 2 months ago

terwin commented 3 months ago

A Kokkos profiling region is created in Cabana::Grid::Halo::gather at grid/src/Cabana_Grid_Halo.hpp:291. When the neighbor size is zero, gather will immediately return without closing the profiling region.

This resulted in a SEGFAULT when profiling a test case on a single rank.

@kwitaechong

dalg24 commented 3 months ago

Cabana only requires Kokkos 3.7 at the moment but release 4.1 added Kokkos::Profiling::ScopedRegion which was precisely designed for this situation.

dalg24 commented 3 months ago

In the meantime you could do

diff --git a/grid/src/Cabana_Grid_Halo.hpp b/grid/src/Cabana_Grid_Halo.hpp
index aee5f2f5..80cdaee3 100644
--- a/grid/src/Cabana_Grid_Halo.hpp
+++ b/grid/src/Cabana_Grid_Halo.hpp
@@ -293,7 +293,10 @@ class Halo
         // Get the number of neighbors. Return if we have none.
         int num_n = _neighbor_ranks.size();
         if ( 0 == num_n )
+        {
+            Kokkos::Profiling::popRegion();
             return;
+        }

         // Get the MPI communicator.
         auto comm = getComm( arrays... );

or

diff --git a/grid/src/Cabana_Grid_Halo.hpp b/grid/src/Cabana_Grid_Halo.hpp
index aee5f2f5..f3938553 100644
--- a/grid/src/Cabana_Grid_Halo.hpp
+++ b/grid/src/Cabana_Grid_Halo.hpp
@@ -288,13 +288,13 @@ class Halo
     void gather( const ExecutionSpace& exec_space,
                  const ArrayTypes&... arrays ) const
     {
-        Kokkos::Profiling::pushRegion( "Cabana::Grid::gather" );
-
         // Get the number of neighbors. Return if we have none.
         int num_n = _neighbor_ranks.size();
         if ( 0 == num_n )
             return;

+        Kokkos::Profiling::pushRegion( "Cabana::Grid::gather" );
+
         // Get the MPI communicator.
         auto comm

depending on whether you care to see the region in the tools

terwin commented 3 months ago

The same issue is present in Grid::Halo::scatter and similarly, but more complicated, in

Rather than dealing with possible early returns/exceptions, explicit pushRegion/popRegion pairs could be replaced by Kokkos::Profiling::ScopedRegion. This could be applied throughout the code, but should at least be used in the above and possibly in

terwin commented 3 months ago

Cabana only requires Kokkos 3.7 at the moment but release 4.1 added Kokkos::Profiling::ScopedRegion which was precisely designed for this situation.

@dalg24 I tracked that down before seeing your comment, but didn't notice that it was introduced in 4.1. I did find other fragile instances of {push,pop}Region throughout as listed above. Those should cover all cases with return and/or throw in the body, but obviously won't account for any callees that throw. Every profiling region in the code is already defined by its own scope, so I don't see any reason not to prefer ScopedRegion; RAII for the win.

dalg24 commented 3 months ago

My preference would be bump the minimum requirement for Kokkos but another option might be what we were doing in ArborX before requiring 4.1 https://github.com/arborx/ArborX/blob/4bf64b83f53cc6c69619f692cf7f5ef9ac94fc42/src/kokkos_ext/ArborX_DetailsKokkosExtScopedProfileRegion.hpp

streeve commented 3 months ago

I have no problem bumping the min required to 4.1

@kwitaechong can you replace all {push,pop}Region with ScopedRegion?