job dependencies using --prereq and --batch-args behave differently

minxu74 commented 1 month ago

Two jobs A and B, B with CONTINUE_RUN=.TRUE. depends on A and A will generate restart files for B.

using --batch-args="--dependency=afterok:JOBID" job B failed with an error "ERROR: CONTINUE_RUN is true but this case does not appear to have restart files staged in "
using --prereq JOBID job B was submitted successfully

jedwards4b commented 1 month ago

@minxu74 can you please provide detailed instructions to reproduce the issue? In particular the output of ./preview_run for each of the two cases might be very reviling. Also indicate what cime tag you are using. Thanks

minxu74 commented 1 month ago

@jedwards4b Thanks a lot for your quick response.

E3SM: fc020071ceb7e5f37ee2d39f3d2f18084fdf8148
CIME: 0cdd4b1c5c5eb2e29c6ec64667724af434847bcf

The simplified steps to reproduce the problem are as follow:

1. ./create_newcase --case test_case --mach pm-cpu --compset I18  
50CNPRDCTCBC --res hcru_hcru --mpilib mpich --walltime 24:00:00 --handle-preexisting-dirs u --project xxxx --compiler intel
2. ./xmlchange STOP_N=20
3. ./xmlchange REST_N=20
4. ./case.setup -r
5. ./case.build
6. ./case.submit  (JOB A with JOBID)
7. ./xmlchange CONTINUE_RUN=TRUE
8. (JOB B)
   - ./case.submit --prereq JOBID (passed)
   - ./case.submit --batch-args="--dependency=afterok:JOBID" (failed)

The above two jobs have the same output of preview_run as follows:

CASE INFO:  
 nodes: 12  
 total tasks: 1536  
 tasks per node: 128  
 thread count: 1  
 ngpus per node: 0  

BATCH INFO:  
 FOR JOB: case.run  
   ENV:  
     Setting Environment ADIOS2_ROOT=/global/cfs/cdirs/e3sm/3rdparty/adios2/2.9.1/cray-mpich-8.1.25/intel-2023.1.0  
     Setting Environment BLA_VENDOR=Intel10_64_dyn  
     Setting Environment FI_CXI_RX_MATCH_MODE=software  
     Setting Environment GATOR_INITIAL_MB=4000MB  
     Setting Environment HDF5_USE_FILE_LOCKING=FALSE  
     Setting Environment MOAB_ROOT=/global/cfs/cdirs/e3sm/software/moab/intel  
     Setting Environment MPICH_COLL_SYNC=MPI_Bcast  
     Setting Environment MPICH_ENV_DISPLAY=1  
     Setting Environment MPICH_MPIIO_DVS_MAXNODES=1  
     Setting Environment MPICH_VERSION_DISPLAY=1  
     Setting Environment NETCDF_PATH=/opt/cray/pe/netcdf-hdf5parallel/4.9.0.9/intel/2023.2  
     Setting Environment OMP_NUM_THREADS=1  
     Setting Environment OMP_PLACES=threads  
     Setting Environment OMP_PROC_BIND=spread  
     Setting Environment OMP_STACKSIZE=128M  
     Setting Environment PERL5LIB=/global/cfs/cdirs/e3sm/perl/lib/perl5-only-switch  
     Setting Environment PNETCDF_PATH=/opt/cray/pe/parallel-netcdf/1.12.3.9/intel/2023.2  

   SUBMIT CMD:  
     sbatch --time 48:00:00 -q regular --account xxxx test_case/.case.run --resubmit  

   MPIRUN (job=case.run):  
     srun  --label  -n 1536 -N 12 -c 2  --cpu_bind=cores   -m plane=128 test_case/bld/e3sm.exe   >> e3sm.log.$LID 2>&1

jedwards4b commented 1 month ago

I think that I misread the issue - can you please clarify? My understanding is that using the --prereq flag works as expected, but using --batch-args does not. Is that correct?

minxu74 commented 1 month ago

I think that I misread the issue - can you please clarify? My understanding is that using the --prereq flag works as expected, but using --batch-args does not. Is that correct?

Yes.

jedwards4b commented 1 month ago

Using cime6.1.29 (cesm3) I have checked that both these methods create the same sbatch command.
sbatch --time 02:00:00 -q debug --account mp9 --dependency=afterok:99999 /global/u1/j/jedwards/cesm3/cime/scripts/caseB/.case.run --resubmit I also checked cime hash 0cdd4b1c5c5eb2e29c6ec64667724af434847bcf and both methods appear to work the same. Can you capture and post the output of the case.submit command for each of these methods? I'm interested in the sbatch command generated for each method. I think that perhaps what is wrong is that you are specifying the case.run JobID and what you need to specify is the case.st_archive jobid. If you specify the case.run jobid you create a race condition in which the second case may start before the case.st_archive from the first run is completed, which will produce the error you are reporting.

minxu74 commented 1 month ago

The short-term archive was turned off in both jobs. I anticipated that both jobs would have the same sbatch. The error was caused by the CIME to check the restart files when "CONTINUE_RUN=.TRUE." and using "--batch-args", but the CIME skipped the checking when using "--prereq".

jedwards4b commented 1 month ago

Thank you - that helps a lot. Using --prereq has the side effect of skipping the CONTINUE_RUN check. I'm not sure how you could possibly implement this same side effect using the batch-args method. Would simply clarifying the documentation of this feature be an acceptable solution?

minxu74 commented 1 month ago

Thanks. Yes. It will be helpful to clarify the difference between the prereq and batch-args in the document with regard to the side effect of skipping the restart file check.

ESMCI / cime

job dependencies using --prereq and --batch-args behave differently #4693