Open DavidHuber-NOAA opened 7 months ago
I believe that the problematic code is located here: https://github.com/NOAA-EMC/global-workflow/blob/de8706702ead0630beb54d868f83aa2cb23f8f79/scripts/exglobal_atmos_analysis.sh#L576-L593
Looping over npe_gsi-1
will not create all of the links necessary if npe
does not equal ncpus=(npe_node*nodes)
. To fix this, the loop should be changed to loop over npe_node * nnodes - 1
.
What is wrong?
If the enkfgdaseobs job is run with more processors than (MPI tasks) x (threads), data will be left on the floor and result in an incomplete analysis. Kludges have been placed for S4 and Jet, but new systems with different core/node counts will need similar kludges.
What should have happened?
The
enkfgdaseobs
job should be able to collect all necessary data regardless of how many cores are used.What machines are impacted?
All or N/A
Steps to reproduce
An example pair of plots from @CoryMartin-NOAA is below:
Additional information
This was first captured in #154.
Do you have a proposed solution?
I'm not sure if this is a scripting change in the global-workflow or a code change in the GSI. But once it is fixed, the config.resources file should be simplified to use the same number of processes across all systems.