Closed yichengt900 closed 1 month ago
Hi @uwagura, when you have chance, can you try the following to see if you can run the NEP10.COBALT example:
git clone -b feature/nep_update https://github.com/NOAA-GFDL/CEFI-regional-MOM6.git --recursive
cd CEFI-regional-MOM6/builds;
sbatch ci_build_driver_c6.sh; # wait for the build process
cd ../exps
ln -fs /gpfs/f6/ira-cefi/world-shared/datasets ./
cd NEP10.COBALT
sbatch driver.sh # This will run three cases. Check the stdout to see if it has completed successfully.
Thanks!
@yichengt900 , I got the following output from driver.sh
Test started: Thu 24 Oct 2024 01:59:05 PM EDT
link datasets ...
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps /gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
clean RESTART folders ...
run 20x56 48hrs test ...
run 20x56 24hrs test ...
link restart files ...
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT/INPUT /gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
run 20x56 24hrs rst test ...
compare MOM.res*.nc
Error: MOM.res*.nc is not identical, please check! Exiting now...
Thanks, @uwagura. Hmm, that's interesting because the same case just passed using the GHA workflow. I’m wondering if it might be a permissions issue—since the restart folders are soft-linked to a folder I created under project-shared. Could you check the log files (out1, out2, out3, and err1, err2, err3) in the same folder and see if there are any further error messages.
@yichengt900 Yeah, It looks like permissions are part of the problem. From my the end of my out1 file:
FATAL from PE 0: NETCDF ERROR: Permission denied File=RESTART/MOM.res.nc
Not sure if it's relevant or if it's just an offshoot of this problem, but my job error file only contains the following line:
024-10-24 14:39:07.727445 -0400 ERROR /tmp/Sauesw.User/spack-stage/spack-stage-nccmp-1.9.1.0-3j7eylkx5uoiqoxr5kewcghorprw5mfy/spack-src/src/nccmp_state.c:163 File not found: ./RESTART_24hrs_rst/MOM.res*.nc
@uwagura, thanks! In this case, would you mind trying sbatch run.sub
within the same folder? I just want to ensure that anyone who wants to try NEP10
can run this example successfully on C6.
@uwagura, thanks! In this case, would you mind trying
sbatch run.sub
within the same folder? I just want to ensure that anyone who wants to tryNEP10
can run this example successfully on C6.
@yichengt900 , it looks like this example ran successfully
@uwagura, thanks! In this case, would you mind trying
sbatch run.sub
within the same folder? I just want to ensure that anyone who wants to tryNEP10
can run this example successfully on C6.@yichengt900 , it looks like this example ran successfully
Thanks, @uwagura! It would be great if you could also double-check for any typos in this README and approve the PR when you get a chance. Thanks again!
Hi @uwagura, have you had a chance to look over this README? If there aren’t any obvious typos or mistakes, it would be great if you could approve this PR when you get a moment. Thanks!
As titled, this PR updates the model configurations for our
NEP10.COBAL
T example case, aligning it withMOM6-COBALT-NEP10k v1.0
. We've also made theNEP10
XML publicly available.. We have also activated the NEP10 regression testing, which will now check for reproducibility across restarts in the NEP10 domain. CC @amoebaliz.NEP10
regression testing has not been activated yet, but we will address that in a following PR