Closed DWesl closed 2 weeks ago
@DWesl This was recently fixed in the global-workflow as part of an overhaul of the resource configuration system. The job now runs with a single task by default. See https://github.com/NOAA-EMC/global-workflow/pull/2804 and let me know if updating your global-workflow resolves the issue.
The new setting in verif-global (should have checked this earlier) references nproc
and defaults to one:
https://github.com/NOAA-EMC/EMC_verif-global/blob/92904d2c431969345968f74e676717057ec0042a/ush/run_verif_global_in_global_workflow.sh#L277-L279
and global-workflow sets nproc
in config.metp, which should solve the problem more generally.
Running for a C768 run as part of global-workflow produces a specification with
nodes=1
,ppn=4
, andtpp=1
. Running withush/run_verif_global_in_global_workflow.sh
produces a job withnproc=${npe_node_metp_gfs}=1
. When run on HERA,scripts/exgrid2grid_step1.sh
launches the METplus job withsrun --multi-prog /path/to/task-file
, wheretask-file
hasnproc
lines detailing commands to execute.srun
then fails because it can't find as many tasks as it wants; I think it is defaulting to four tasks.Changing
scripts/exgrid2grid_step1.sh
to specify--ntasks ${nproc}
as part of thesrun
command allows the process to finish. A better solution probably involves changing howush/run_verif_global_in_global_workflow.sh
determinesnproc
:man sbatch
suggestsSLURM_NTASKS
, butglobal-workflow
probably has a variable to specify the number of threads that would be less closely tied to the job manager.