Closed AaronDonahue closed 1 year ago
Testing master with hash: 11787720eb1936d5ee2963097885164bca30e087
(May 24th) and switching the homme_shoc_cld_spa_p3_rrtmgp
test to ne4 yields
The following tests FAILED:
202 - homme_shoc_cld_spa_p3_rrtmgp_np1_vs_np3_bfb (Failed)
203 - homme_shoc_cld_spa_p3_rrtmgp_np1_vs_np4_bfb (Failed)
Note, this means that np1 vs. np2 passed. I confirmed this manually.
Putting the diffs to master here for reproducibility:
diff --git a/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/CMakeLists.txt b/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
index dd045cc..c21e536 100644
--- a/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
+++ b/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
@@ -29,7 +29,7 @@ configure_file(${CMAKE_CURRENT_SOURCE_DIR}/homme_shoc_cld_spa_p3_rrtmgp_output.y
# Set homme's test options, so that we can configure the namelist correctly
# Discretization/algorithm settings
-set (HOMME_TEST_NE 2)
+set (HOMME_TEST_NE 4)
set (HOMME_TEST_LIM 9)
set (HOMME_TEST_REMAP_FACTOR 3)
set (HOMME_TEST_TRACERS_FACTOR 1)
@@ -64,7 +64,7 @@ set (TEST_INPUT_FILES
init/spa_init_ne2np4.nc
init/map_ne4np4_to_ne2np4_mono.nc
init/spa_file_unified_and_complete_ne4_20220428.nc
- init/homme_shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
+ init/init_ne4np4.nc
)
foreach (file IN ITEMS ${TEST_INPUT_FILES})
GetInputFile(${file})
diff --git a/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/input.yaml b/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/input.yaml
index 8ad376b..ed2b7df 100644
--- a/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/input.yaml
+++ b/components/scream/tests/coupled/dynamics_physics/homme_shoc_cld_spa_p3_rrtmgp/input.yaml
@@ -10,7 +10,7 @@ Time Stepping:
Number of Steps: ${NUM_STEPS}
Initial Conditions:
- Filename: ${SCREAM_DATA_DIR}/init/homme_shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
+ Filename: ${SCREAM_DATA_DIR}/init/init_ne4np4.nc #homme_shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
Physics GLL:
surf_latent_flux: 0.0
surf_sens_flux: 0.0
@@ -21,7 +21,7 @@ atmosphere_processes:
homme:
Process Name: Homme
Enable Precondition Checks: false
- Vertical Coordinate Filename: ${SCREAM_DATA_DIR}/init/homme_shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
+ Vertical Coordinate Filename: ${SCREAM_DATA_DIR}/init/init_ne4np4.nc #homme_shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
Moisture: moist
physics:
atm_procs_list: (mac_aero_mic,rrtmgp)
@@ -38,7 +38,7 @@ atmosphere_processes:
Grid: Physics GLL
spa:
Grid: Physics GLL
- SPA Remap File: ${SCREAM_DATA_DIR}/init/map_ne4np4_to_ne2np4_mono.nc
+ SPA Remap File: none # ${SCREAM_DATA_DIR}/init/map_ne4np4_to_ne2np4_mono.nc
SPA Data File: ${SCREAM_DATA_DIR}/init/spa_file_unified_and_complete_ne4_20220428.nc
p3:
Grid: Physics GLL
diff --git a/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/CMakeLists.txt b/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
index 037bc29..428a663 100644
--- a/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
+++ b/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/CMakeLists.txt
@@ -21,7 +21,7 @@ set (TEST_INPUT_FILES
init/shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
init/spa_init_ne2np4.nc
init/map_ne4np4_to_ne2np4_mono.nc
- init/spa_file_unified_and_complete_ne4_20220428.nc
+ init/init_ne4np4.nc
)
foreach (file IN ITEMS ${TEST_INPUT_FILES})
GetInputFile(${file})
diff --git a/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/input.yaml b/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/input.yaml
index 7ce4e20..237f297 100644
--- a/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/input.yaml
+++ b/components/scream/tests/coupled/physics_only/shoc_cld_spa_p3_rrtmgp/input.yaml
@@ -25,7 +25,7 @@ atmosphere_processes:
Grid: Point Grid
spa:
Grid: Point Grid
- SPA Remap File: ${SCREAM_DATA_DIR}/init/map_ne4np4_to_ne2np4_mono.nc
+ SPA Remap File: none # ${SCREAM_DATA_DIR}/init/map_ne4np4_to_ne2np4_mono.nc
SPA Data File: ${SCREAM_DATA_DIR}/init/spa_file_unified_and_complete_ne4_20220428.nc
rrtmgp:
Grid: Point Grid
@@ -34,12 +34,12 @@ atmosphere_processes:
Grids Manager:
Type: Mesh Free
- Number of Global Columns: 218
+ Number of Global Columns: 866
Number of Vertical Levels: 72 # Will want to change to 128 when a valid unit test is available.
Initial Conditions:
# The name of the file containing the initial conditions for this test.
- Filename: ${SCREAM_DATA_DIR}/init/shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
+ Filename: ${SCREAM_DATA_DIR}/init/init_ne4np4.nc #shoc_cld_spa_p3_rrtmgp_init_ne2np4.nc
Point Grid:
Load Latitude: true
Load Longitude: true
Does the test fail even with 1 timestep?
If so, does this depend on what parametrizations are used in physics? E.g., if you run 1 step of homme+p3 only, is it BFB?
Note: if 1 timestep passes, then it gets harder, cause we need the reduced process list to not crash in order to run it for 2+ timesteps.
@bartgol , I'll check this morning and respond.
In the meantime, just to note. Running just shoc_cld_spa_p3_rrtmgp
on an ne4 grid passes. So there is some dependency on homme
for the failure, or potentially just horizontal communication. Given the last red herring, it's possible there is some wrap-around effect happening here, but I'm pretty sure I'm setting homme to se_ne=4
. I'll check one more time though.
namelist.nl
for the homme+physics that fails:
cubed_sphere_map = 0
disable_diagnostics = .false.
dt_remap_factor = 3
dt_tracer_factor = 1
hypervis_order = 2
hypervis_scaling = 0
hypervis_subcycle = 1
hypervis_subcycle_tom = 0
nu = 7e15
nu_top = 2.5e5
se_ftype = 0
se_geometry = "sphere"
se_limiter_option = 9
se_ne = 4
se_nsplit = -1
se_partmethod = 4
se_topology = "cube"
se_tstep = 300
statefreq = 9999
theta_advect_form = 1
theta_hydrostatic_mode = true
tstep_type = 5
vert_remap_q_alg = 1
and just looking at homme output for the test and the actual simulation output confirms that it is running an ne4 case.
@bartgol , to answer your first question, it looks like the fail doesn't show up until step = 2. So all tests pass if we only take 1 step. I'll test homme+p3
next. I think we can get 2 steps without crashing.
List of tests and results: (np24 vs. np48)
homme_shoc_cld_spa_p3_rrtmgp
- FAILshoc_cld_spa_p3_rrtmgp
- PASShomme_shoc_cld_spa_p3
- FAILhomme_shoc_cld_spa
- PASShomme_p3
- PASShomme_cld_spa_p3
- PASShomme_shoc_cld_p3
- PASSBased on this, unfortunately, it looks like a FAIL is triggered when we use homme + mac_mic, all of them. At least we can rule out rrtmgp
, but we can't isolate just one physics process.
How many time steps are run? Perhaps you'll get more FAILs if the test runs longer.
@ambrad , I thought of that. So far I'm taking 4 steps. I can try a run with something long, like 40 steps. I'll note that all of the fails listed above fail at tstep=2, so in those cases its rather quick.
Okay, finished a 40 step run with homme_shoc_cld_spa
and that also PASSES.
I ran ne30 SCREAMv1 on 256 and 320 ranks on Chrysalis and get diffs plotted here for qv at the surface (columns are run1, run2, log10(abs(run2 - run1))): https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.ambradl/scream/ne30v1.diff/ne30v1-qv.html. Output is every time step for one day.
Digging down deeper... running homme_shoc_cld_p3
but commenting out sections of p3
p3_main
- PASSp3_main_init
in p3_main
- PASSp3_init' and
p3_main_part1,2,3` - PASSgeneralized_sedimentation
commented out - PASS
So it seems to be related to the upwind scheme. Thanks for the plots, Andrew. I'm not sure what we can conclude from them other than differences between runs start small in a lot of different places and grow over time (as one would expect for non-bfb weather simulations). Am I missing something?
@PeterCaldwell I was interested in the nature of the diffs: how they emerge spatially and temporally, and how much they influence the solution. I think there are interesting points in the first three rows.
One possibility is this is a matter of uninitialized variables. Different PE layouts can reveal these.
@AaronDonahue what exactly do you mean by "cloud sedimentation with generalized_sedimentation commented out"? To break the while loop on dt_left, we need generalized_sedimentation to compute dt_sub and then update dt_left. Thus, I tried #if 0
-ing out everything but that dt calc, like this:
template <typename S, typename D>
template <int nfield>
KOKKOS_FUNCTION
void Functions<S,D>
::generalized_sedimentation (
const uview_1d<const Spack>& rho,
const uview_1d<const Spack>& inv_rho,
const uview_1d<const Spack>& inv_dz,
const MemberType& team,
const Int& nk, const Int& k_qxtop, Int& k_qxbot, const Int& kbot, const Int& kdir, const Scalar& Co_max, Scalar& dt_left, Scalar& prt_accum,
const view_1d_ptr_array<Spack, nfield>& fluxes,
const view_1d_ptr_array<Spack, nfield>& Vs, // (behaviorally const)
const view_1d_ptr_array<Spack, nfield>& rs)
{
// compute dt_sub
EKAT_KERNEL_ASSERT(Co_max >= 0);
const Int tmpint1 = static_cast<int>(Co_max + 1);
const Scalar dt_sub = dt_left/tmpint1;
#if 0
// Move bottom cell down by 1 if not at ground already
const Int k_temp = (k_qxbot == kbot) ? k_qxbot : k_qxbot - kdir;
calc_first_order_upwind_step<nfield>(rho, inv_rho, inv_dz, team, nk, k_temp, k_qxtop, kdir, dt_sub, fluxes, Vs, rs);
team.team_barrier();
// accumulated precip during time step
if (k_qxbot == kbot) {
const auto sflux0 = scalarize(*fluxes[0]);
prt_accum += sflux0(kbot) * dt_sub;
}
else {
k_qxbot -= kdir;
}
#endif
// update time remaining for sedimentation
dt_left -= dt_sub;
}
When I do this, I still see the same diff pattern.
yeah, you're right @ambrad , thanks for checking my work. There was a typo in how I commented out the generalized_sedimentation routine that effectively nullified all of cloud_sedimentation. Which points to just the interface of the sedimentation. I'll keep digging today.
Okay, I think I've narrowed it down to SPA. It's the application of prescribed_CCN
in P3 that appears to make the difference. When I either set nc_activated
(soon nccn) to 0 in SPA it passes, or when I simply comment out
if (do_prescribed_CCN) {
// nc(k).set(not_drymass, max(nc(k), nccn_prescribed(k)));
} else if (predictNc) {
nc(k).set(not_drymass, max(nc(k) + nc_nuceat_tend(k) * dt, 0.0));
} else {
// nccnst is in units of #/m3 so needs to be converted.
nc(k).set(not_drymass, nccnst*inv_rho(k));
}
in p3_main_impl_part1.hpp
. It works.
@AaronDonahue I tried commenting that out, and the ne30 1-day runs still diff. But if I return from P3 main immediately, it doesn't. So it is something in P3, but it seems not to be just that one line.
@ambrad , what happens if you hijack SPA and force nc_activated
to be uniformly 0.0? Ala
[asdonah@blake spa]$ git diff atmosphere_prescribed_aerosol.cpp
diff --git a/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp b/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp
index 0538e63..bbc25d1 100644
--- a/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp
+++ b/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp
@@ -218,6 +218,8 @@ void SPA::run_impl (const int dt)
const auto& pmid_tgt = get_field_in("p_mid").get_view<const Pack**>();
SPAFunc::spa_main(SPATimeState, pmid_tgt, m_buffer.p_mid_src,
SPAData_start,SPAData_end,m_buffer.spa_temp,SPAData_out);
+
+ Kokkos::deep_copy(SPAData_out.CCN3,0.0);
}
// =========================================================================================
I ran a simulation with the SPA 0-copy and still see the diff pattern.
okay, then the homme+physics unit test is not sufficient for debugging this problem... Good to know. I've verified that without that change it fails, with the change it passes. So we may be dealing with a case where the unit test is missing something that the CIME cases catches
Just to humor me, what happens if we zero out all of the forcing from SPA? So adding Kokkos::deep_copy(SPAData_out.X,0.0)
for X=' 'AER_G_SW
, AER_SSA_SW
, AER_TAU_SW
and AER_TAU_LW
?
or simpler would be to just turn SPA off in the scream input
There appears to be an issue with SPA's reading of data properly that I am trying to track down..
Sorry for the delay. Based on your suggestion, I did this:
--- a/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp
+++ b/components/scream/src/physics/spa/atmosphere_prescribed_aerosol.cpp
@@ -164,7 +164,7 @@ void SPA::initialize_impl (const RunType /* run_type */)
@@ -197,6 +197,12 @@ void SPA::initialize_impl (const RunType /* run_type */)
// upper bound set to 1.01 as max(g_sw)=1.00757 in current ne4 data assumingly due to remapping
// add an epslon to max possible upper bound of aero_ssa_sw
+ Kokkos::deep_copy(SPAData_out.CCN3, 0);
+ Kokkos::deep_copy(SPAData_out.AER_G_SW, 0);
+ Kokkos::deep_copy(SPAData_out.AER_SSA_SW, 0);
+ Kokkos::deep_copy(SPAData_out.AER_TAU_SW, 0);
+ Kokkos::deep_copy(SPAData_out.AER_TAU_LW, 0);
+
add_postcondition_check<FieldWithinIntervalCheck>(get_field_out("aero_g_sw"),m_grid,0.0,1.0,true);
add_postcondition_check<FieldWithinIntervalCheck>(get_field_out("aero_ssa_sw"),m_grid,0.0,1.0,true);
add_postcondition_check<FieldWithinIntervalCheck>(get_field_out("aero_tau_sw"),m_grid,0.0,1.0,true);
@@ -206,6 +212,7 @@ void SPA::initialize_impl (const RunType /* run_type */)
// =========================================================================================
void SPA::run_impl (const int dt)
{
+ return;
/* Gather time and state information for interpolation */
auto ts = timestamp();
/* Update the SPATimeState to reflect the current time, note the addition of dt */
@@ -217,6 +224,8 @@ void SPA::run_impl (const int dt)
const auto& pmid_tgt = get_field_in("p_mid").get_view<const Pack**>();
SPAFunc::spa_main(SPATimeState, pmid_tgt, m_buffer.p_mid_src,
SPAData_start,SPAData_end,m_buffer.spa_temp,SPAData_out);
}
But it still diffs.
To be sure I wasn't somehow checking the wrong output files or something, I retested with an immediate return from p3_main (which previously I had noted resolves the diffs), and see no diff. So my test results are very likely correct.
I'll do some bisecting inside P3 to see if I can find anything.
Right now it's not clear to me P3 is actually the problem. I disabled code to the point that I could cause the diff or prevent it by toggling a single line of code (the nc.set line in cloud_dsd2), but I think that's a red herring; that line of code is not really the problem. I need to think about this issue some more.
Status: I updated to today's master branch (f1c36273f0). I still see the diffs (as expected). My comparison runs are configured to do 6 time steps and use ATM_NCPL=288, dt_remap_factor=1, no process subcycling, GNU debug, pack size 1, ne30 F2010-SCREAMv1, on Chrysalis.
@ambrad , thanks for the update. I was able to re-create your fails on quartz using master from yesterday (8ba92eb350c18dea27febd5371170634c8c0f47b
) running F2010-SCREAMv1 on an ne4 mesh for one day, out-of-the-box, meaning with the subcycling, etc. on 48 and 96 ranks respectively. I'm testing both intel-DEBUG and intel-OPT.
As you pointed out, I can get the PASS by commenting out a small section in P3, in my case all of p3_main_part1
, but I agree this is likely a red herring. Especially because your line is in a different place than mine.
Could the location of which line works be compiler/machine related? On blake I could get it to pass by commenting out the nc_activated line inside p3_main_part1
, but on quartz I need to skip p3_main_part1
altogether, and for you its the dsd2
line in cloud sedimentation.
Note, I don't think we should start chasing phantoms in the code, but maybe the compiler/machine dependency is a helpful clue.
I suspect there is something like uninitialized workspace somewhere, causing difficulties in isolating the exact cause. (But it's probably init'ed memory from valgrind's viewpoint; I'm speculating PE-layout-dependent workspace buffer memory that has values from other processes.)
Re: machine/compiler, I switched to GNU-debug-packsize=1 in order to avoid any possible issues with Intel optimizations and vectorization.
@AaronDonahue I went back to your comment about SPA. I realized that in my original tests of making SPA irrelevant, I was allowing nc to have possibly uninitialized values (although I'm not fully convinced of that). In any case, I've switched now to this (single) modification:
#if 1
nc(k).set(not_drymass, nccnst*inv_rho(k));
#else
if (do_prescribed_CCN) {
nc(k).set(not_drymass, max(nc(k), nccn_prescribed(k)/inv_cld_frac_l(k)));
} else if (predictNc) {
nc(k).set(not_drymass, max(nc(k) + nc_nuceat_tend(k) * dt, 0.0));
} else {
// nccnst is in units of #/m3 so needs to be converted.
nc(k).set(not_drymass, nccnst*inv_rho(k));
}
#endif
This fixes the diffs for my short runs. I'm setting up longer runs and will report back.
Longer runs diff. So it wasn't that.
I wanted to double-check another thing with a clean repo. I continue to run with NCPL=288, pack size 1, and the GNU compilers.
As reported in the last message, skipping the nc modification in the do_prescribed_CCN block does not fix the diff.
Next, I ran a simulation with a return from p3_main_part1 here:
nucleationPossible = false;
hydrometeorsPresent = false;
team.team_barrier();
return; // skip most of part1
It crashes just after 15 hours with TL1_1 < 0. But as already reported, the diff is gone up to that time. Many of the microphysics fields are 0, though, so (as we already know) this doesn't isolate the bug very well.
More observations and thoughts:
Not sure if it adds useful information, but skipping HOMME by adding a return;
to the run_impl
in the homme interface also switched fail to pass for me. Even just commenting out the call to prim_run_f90
which ultimately calls prim_run_subcycle
was sufficient to get my tests to pass.
Do we know the amount of non-bib ness? I'm wondering if this is due to O(eps) difference between the same DOF replicated on bordering elems.
Basically, on dyn grid, if dof N is shared across elem 1, 12, and 123, only one of the copies of that DOF will be saved to the out file. These can be non-bfb, since the MPI accumulation of the 3 contribution during DSS can happen in different order on the 3 elements. This is why, even for GLL physics grid, dynamics needs to write restart files on the full dyn grid (rather than just the unique GLL version of it).
With different npN, I suspect we can get slightly different d2p remap, in the sense that the element that is assigned the unique DOF might not be the same for np1 and np4, leading to a value saved to output nc file that is slightly different.
@ambrad Correct me if I'm wrong, but I suspect this is quite likely. Although, if this is true, then all ERP tests would be bound to fail, so maybe I'm wrong.
Note: inside each elem, the DSS is performed always in the same way. The issue is which elem is assigned the unique DOF (which might change with the number of ranks, due to different space curve split), which is what determines which DSS rearrangement is used in the IO.
Apologies if you've already checked this and I didn't catch it - do we know whether the non-bfbness is solely in the output, or is the solution different? I think Luca's hypothesis would just change the output. You should be able to check whether the solution is BFB by checking whether homme.atm.log has BFB values for the last timestep's output...
The diffs are in the solution state, so they are real. One sees the diffs start locally at a small magnitude and then expand in space and grow in magnitude.
3-hourly snapshots of qv at level 62 of 72 are here: https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.ambradl/scream/ne30v1.diff/ne30v1-qv.html. Only the third column, the diff, is of interest, as the diffs are too small in magnitude in this short run to see in the fields themselves, in the case of qv (left two columns). Note how the diff starts small (by hour 3) and then quickly grows.
@PeterCaldwell re: your specific question of homme_atm.log output, if I diff the "qv( 1)" lines, I see diffs starting in the 4th time step (with ATM_NCPL=288 so that dynamic time steps are 1-1 with physics time steps). This output is based on model state.
Although the tests to date don't yet support isolating the problem just to the AD, I do speculate that something in the dyn<->phys grid handling is the issue, in particular how overlapping dyn GLL columns get mapped to unique physics columns for the parameterizations phase. I speculate there could be an issue where the GLL physics global ID is not PE-invariantly mapped to a GLL dynamics global ID.
Actually, I'm not sure my comment would only apply to the output. All d2p remaps would suffer from np4 having a different elem owner for dof XYZ.
Re-reading my post, I'm thinking this is probably not the case, or else all ERP tests would make no sense...
I speculate there could be an issue where the dynamics GLL dynamics global ID is not PE-invariantly mapped to the GLL physics global ID.
Yes, that is a reasonable possibility. I will revisit that implementation.
A first look at src/dynamics/homme/interface/phys_grid_mod.F90
, which is where we compute physics dofs, seems to suggest that dof should be PE independent. In particular, the number of dofs on each elem and the actual dof gids seem to only depend on the elements gids, which, as far as I understand from homme's mesh/cube/dof mods, are PE independent. But since it's late here, I'll check again tomorrow.
@AaronDonahue I confirmed in my setup what you wrote re: disabling the dycore.
@bartgol I looked at the remap code out of curiosity. Just taking a random stab at this, I'm speculating that in create_p2d_map, the loop https://github.com/E3SM-Project/scream/blob/22add4597ba4a172e7cf30dce4636e89b4c17e64/components/scream/src/dynamics/homme/physics_dynamics_remapper.cpp#L980-L993 can assign a different dyn col to a physics col, depending on what elements are on an MPI process. Consider a physics grid point at the corner of an element, shared by four elements. Then consider what happens in these two cases: (1) a process has all four elements sharing that physics grid point; (2) a process has just one element total. To me it seems that in case 1, the chosen dynamics grid point could be in principle any of the four elements sharing that geographical point, whereas in case 2, there's only one dynamics grid point available in the search.
If that's correct, then I think a solution is to have the concept of an owning element for each physics point. One could establish the convention that the element having the lowest element GID among those sharing the physics point is the owner or, similarly, the lowest dynamics GID. Edit later: Actually, the element convention would lead to one element having just four physics columns, bad for load balancing. In EAM, I think Patrick Worley may have implemented a number of more sophisticated schemes for load balancing optimally. Once pgN is in, load balance will be trivial since only one element is associated with a pgN physics column.
I think the owning element is dictated by dynamics (the dot mod does the resolution of shared dofs, picking the elem with smallest GID to own the dof). I'll double check this snippet, and if you're right, I suspect we can easily fix it.
Looking at the code again, I think this is a leftover from the initial implementation of p2d and all the grid specs in scream. There was a bug we fixed quite a bit of time ago, which had scream dofs not ordered correctly, causing mismatch of dofs numbering against v0. I suspect this snippet in the remapper was relying on some assumption in the original implementation, and didn't get fixed.
Ok, I think I see the issue (though I have no idea why all tests pass if I'm seeing well). Will report shortly.
@ambrad nailed it when he pointed to the p2d map. The fix is not that complicated, so I should have a PR up by the end of the day (in some timezone).
Nice sleuthing! Looking forward to the PR. Thanks @ambrad and @bartgol for getting to the bottom of this.
Sorry, but I have to cut the enthusiasm. What Andrew said was not entirely true. Here are a few key points of dofs construction:
These points ensure that, regardless of number of ranks, each physics DOF is always matched to the same dynamics column.
I verified (by enriching the SE grid with some extra data) that the map
SE_elemGID ( SE_elemLID ( dynLID ( pys gid )))
gave the same answer on 1 and 4 ranks (while I am getting diffs for the np1 vs np4 test).
Edit: there is still the chance things are not entirely correct, but it's a small chance: it is possible that while gathering dynamics dofs, I'm scanning the element list in the wrong order.
Ok, I can confirm that the p2d map is indeed correct, despite the misleading comment at the beginning of the cpp file:
// For each phys dofs, we find a corresponding dof in the dyn grid.
// Notice that such dyn dof may not be unique (if phys dof is on an edge
// of a SE element), but we don't care. We just need to find a match.
// The BoundaryExchange already takes care of syncing all shared dyn dofs.
We do care which dynamics dof we grab, but it just so happens that we are grabbing the same one, regardless of number of ranks,
The homme_shoc_cld_spa_p3_rrtmgp test in the SCREAM unit test suite fails when comparing solutions from running with a different number of ranks, when the grid is switched to ne4.
Note this does not have the same fails when the grid is the default ne2.