Closed AaronDonahue closed 2 years ago
I've confirmed we get the same behavior w/ master.
Here is a sample of the error message:
KERNEL CHECK FAILED:
!ekat::OnGpu<typename device_%s:%u: %s: block:pace>::value || is_pow_of_2(team.team_size())
Error! Team-level parallel_scan on CUDA only works for team sizes that are power of 2.
You could try to reduce the team size to the previous pow of 2.
I've made some progress on tracking down the issue here. @bartgol , @tcclevenger , @ambrad or @jgfouca you may be able to offer some further insight because I think it has to do w/ team policy. The three diagnostics that fail on weaver
w/ the above error all use calculate_dz
from the physics functions: https://github.com/E3SM-Project/scream/blob/master/components/scream/src/share/util/scream_common_physics_functions_impl.hpp#L299
There are two other diagnostics that use the same function and they do work, atm_density
and vertical_layer_thickness
. Looking at the usage in both working and non-working cases there is a pattern.
Kokkos::parallel_for("VerticalLayerThicknessDiagnostic",
Kokkos::RangePolicy<>(0,m_num_cols*npacks),
KOKKOS_LAMBDA(const int& idx) {
const int icol = idx / npacks;
const int jpack = idx % npacks;
dz(icol,jpack) = PF::calculate_dz(pseudo_density_mid(icol,jpack), p_mid(icol,jpack), T_mid(icol,jpack), qv_mid(icol,jpack));
});
NOT-WORKING: https://github.com/E3SM-Project/scream/blob/master/components/scream/src/diagnostics/vertical_layer_interface.cpp#L59
view_1d dz("",npacks);
Kokkos::parallel_for("VerticalLayerInterfaceDiagnostic",
default_policy,
KOKKOS_LAMBDA(const MemberType& team) {
const int icol = team.league_rank();
const auto TTR = Kokkos::TeamThreadRange(team, npacks);
Kokkos::parallel_for(Kokkos::TeamThreadRange(team, npacks), [&] (const Int& jpack) {
dz(jpack) = PF::calculate_dz(pseudo_density_mid(icol,jpack), p_mid(icol,jpack), T_mid(icol,jpack), qv_mid(icol,jpack));
});
team.team_barrier();
const auto& z_int_s = ekat::subview(z_int, icol);
PF::calculate_z_int(team,num_levs,dz,surf_geopotential,z_int_s);
});
I don't see anything inside the calculate_dz function that might trigger the error, but this function does call column_scan
from the column_ops. https://github.com/E3SM-Project/scream/blob/master/components/scream/src/share/util/scream_column_ops.hpp#L234
Note, I was able to narrow it down to calculate_dz
by simple commenting out the line and then the tests will run to completion. Also note, this is when they are used as output. The unit tests for the three diagnostics pass on weaver.
Oh, of course: you need dz
to have size (ncols,npacks)
. Otherwise different teams will try to write on the same pack.
ahhhh simple solution. Great, I'll give it a try
Also, you should prob pre-allocate that view, so you don't call malloc every time the diagnostic is computed.
Also, you should prob pre-allocate that view, so you don't call malloc every time the diagnostic is computed.
view_1d dz("",npacks);
isn't that what this line does?
To be fair, you don't need (ncol,npacks)
. You could use a WSM, to use less memory. But (ncols,npacks)
should def work.
Also, you should prob pre-allocate that view, so you don't call malloc every time the diagnostic is computed.
view_1d dz("",npacks);
isn't that what this line does?
That allocates the view at every call. I meant to allocate it at init time, and store it in the class.
ahhh okay
Interesting that this only failed on weaver.. was it run on the cpu machines?
Interesting that this only failed on weaver.. was it run on the cpu machines?
I think on CPU we never exercise threads, except for p3/shoc functions. So on CPU, this is equiv to a single slot WSM.
Solved w/ #2023
This issue was discovered when developing PR #1989 . This PR also added an output stream to the
homme_shoc_cld_spa_p3_rrtmgp
unit test to output just all of the available diagnostics. This test failed onweaver
for the three diagnostics:We should investigate this error and fix it. Note, there were changes to the output stream in this PR so it's possible the PR itself broke these diagnostics, although it is not clear why these three would be uniquely effected.