Open JessicaMeixner-NOAA opened 1 week ago
@JessicaMeixner-NOAA I will take a look at this issue and prioritize it.
Just to explain the nature of change required to fulfill this issue the ufs.configure
templates for traditional threading are available in the ufs-weather-mode and will be usedl. Another piece of information we need to acquire is how to run the executable and if there are any changes to the resource requests at the job-card #SBATCH
lines.
We will need to replace the APRUN
command; srun -n <nprocs> $FCSTEXEC
line with a more detailed execution sequence for traditional threading, such as this:
time mpiexec -l --line-buffer -n 1392 -ppn 32 --cpu-bind depth --depth 4 env OMP_NUM_THREADS=4 $FCSTEXEC : \
-n 220 -ppn 128 --cpu-bind depth --depth 1 env OMP_NUM_THREADS=1 $FCSTEXEC : \
-n 120 -ppn 120 --cpu-bind depth --depth 1 env OMP_NUM_THREADS=1 $FCSTEXEC : \
-n 80 -ppn 64 --cpu-bind depth --depth 2 env OMP_NUM_THREADS=2 $FCSTEXEC
The numbers here are just representative of the detail needed to construct the APRUN
command.
@aerorahul that will be awesome to have that level of detail in the executable instead of having everything have the same number of threads!!
What new functionality do you need?
Due to an issue with the model hanging on orion/hercules at high resolutions (C768/C1152) being most likely associated with the esmf managed threading (see https://github.com/ufs-community/ufs-weather-model/issues/2486 for more details), to support GFSv17 development we'd like to get traditional threading as an option in the g-w. This comment from @aerorahul https://github.com/ufs-community/ufs-weather-model/issues/2486#issuecomment-2465658188 says that a different set of ufs_configure files are needed. I think these are now available in ufs-weather-model though. I believe those are the options without the _esmf
Hercules and orion are being explored as potential options for retrospective runs as WCOSS2 is busy with multiple large implementations coming up.
What are the requirements for the new functionality?
Allow for an option to run with traditional threading since ESMF managed threading appears to be an issue currently on hercules/orion.
Acceptance Criteria
Suggest a solution (optional)
I believe that the calculation of resources also needs to be updated.