Closed MatthewPyle-NOAA closed 8 months ago
A quick update - got things run with 2x128 task write groups, but writing to a tiny output grid. Also worked for a reasonably large output grid (roughly 80% of the size of the RRFS output grid).
Shifting to a 2 x 192 write task setup allowed the writing of the full RRFS output grid. So slowly getting unstuck.
Have a working quilt specification - just required more quilt nodes than expected.
Description
When attempting to use multiple write groups to keep pace with sub-hourly output, experienced the model hanging before true integration had begun, and eventually timing out.
To Reproduce:
What compilers/machines are you seeing this with?
Seen on WCOSS2 with Intel built code.
Multiple write groups can run without a problem for smaller dimensioned domains, but seems to have trouble with the 3950x2700x65 dimensioned RRFS North America regional domain.
Additional context
Add any other context about the problem here. Directly reference any issues or PRs in this or other repositories that this is related to, and describe how they are related. Example:
Output
Screenshots If applicable, drag and drop screenshots to help explain your problem.
output logs If applicable, include relevant output logs. Either drag and drop the entire log file here (if a long log) or
Testing:
Have you tested the code changes? On what platforms?
Have you run regression test in ufs-weather-model or ufs-s2s-model with code changes?
Dependent PRs:
Directly reference any issues or PRs in this or other repositories that this is related to, and describe how they are related. Example: