NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
184 stars 139 forks source link

bug: missing launch_cf.sh for CESM clean up #642

Open Nuo-Chen opened 4 months ago

Nuo-Chen commented 4 months ago

Describe the bug

Which model(s) are you working with?

I am running CAM-SE with DART with a modified version of assimilate.csh.template. In the final section, the final section where the script cleans up the large DART files, it calls compress.csh.

Error Message

When I tried to run compress.csh independently, compress.out prompts Can't find executable launch_cf.sh

I managed to get it to work with a launch_cf.sh I found somewhere else and by changing the line to mpiexec -n $task ./launch_cf.sh ./mycmdfile since mpiexec_mpt is not supported on Derecho

#!/bin/bash
#
# DART software - Copyright UCAR. This open source software is provided
# by UCAR, "as is", without charge, subject to all terms of use at
# http://www.image.ucar.edu/DAReS/DART/DART_download
#
# $Id$

# for command file jobs.
# Sidd Ghosh Feb 22, 2017
# Slurm added by Kevin Raeder July 6, 2019

# On casper's PBS the PMI_RANK variable is not defined,
# but  OMPI_COMM_WORLD_LOCAL_RANK is (different 
# from OMPI_COMM_WORLD_RANK, which is also defined)
# Also, PBS_O_WORKDIR is defined, so it goes into the wrong section.
# The system launch_cf.sh (cheyenne only) tests directly for -z {env_var_name}
# export 

# echo "launch_cf.sh; OMPI_COMM_WORLD_LOCAL_RANK = $OMPI_COMM_WORLD_LOCAL_RANK"
# cheyenne has this, but calling script needs MPI_SHEPHERD
# echo "              PMI_RANK = $PMI_RANK"

if [ ! -z "$PMI_RANK" ]; then
   line=$(expr $PMI_RANK + 1)
#    echo "launch_cf.sh using PMI_RANK with line = $line"
elif [ ! -z "$OMPI_COMM_WORLD_RANK" ]; then
   line=$(expr $OMPI_COMM_WORLD_RANK + 1)
#    echo "launch_cf.sh using OMPI_COMM_WORLD_RANK with line = $line"
else
   echo "Batch environment is unknown"
   exit 11
fi

INSTANCE=$(sed -n ${line}p $1)

# The following command showed that 563 tasks are launched within .3 seconds.
# echo "launching $INSTANCE at "; date --rfc-3339=ns

eval "$INSTANCE"

# <next few lines under version control, do not edit>
# $URL$
# $Id$
# $Revision$
# $Date$

Version of DART

v10.8.4

Have you modified the DART code?

No

Build information

  1. The machine you are running on Derecho
  2. The compiler you are using Intel
hkershaw-brown commented 4 months ago

Hi Nuo-Chen, thanks for reporting this. We'll get these Cheyenne specific scripts updated for Derecho.

Note there is a Derecho version of launch_cf provided by CISL: https://ncar-hpc-docs.readthedocs.io/en/latest/pbs/job-scripts/?h=launch_cf#pbs-job-arrays

Let us know if you hit any more problems.

Cheers, Helen

Edit: applies to latest version v11 also.