NOAA-GFDL / CEFI-regional-MOM6

A repository containing essential tools, XML files, and source codes for collaborators of the Climate, Ecosystems, and Fisheries Initiative (CEFI) to conduct simulations.
Other
19 stars 16 forks source link

Update NEP10.COBALT Example and Add NEP10 XML Based on MOM6-COBALT-NEP10k v1.0 Configurations #105

Closed yichengt900 closed 1 month ago

yichengt900 commented 1 month ago

As titled, this PR updates the model configurations for our NEP10.COBALT example case, aligning it with MOM6-COBALT-NEP10k v1.0. We've also made the NEP10 XML publicly available. NEP10 regression testing has not been activated yet, but we will address that in a following PR. We have also activated the NEP10 regression testing, which will now check for reproducibility across restarts in the NEP10 domain. CC @amoebaliz.

yichengt900 commented 1 month ago

Hi @uwagura, when you have chance, can you try the following to see if you can run the NEP10.COBALT example:

git clone -b feature/nep_update https://github.com/NOAA-GFDL/CEFI-regional-MOM6.git --recursive
cd CEFI-regional-MOM6/builds; 
sbatch ci_build_driver_c6.sh;  # wait for the build process
cd ../exps
ln -fs /gpfs/f6/ira-cefi/world-shared/datasets ./
cd NEP10.COBALT
sbatch driver.sh # This will run three cases. Check the stdout to see if it has completed successfully.

Thanks!

uwagura commented 1 month ago

@yichengt900 , I got the following output from driver.sh

Test started:   Thu 24 Oct 2024 01:59:05 PM EDT
link datasets ...
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps /gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
clean RESTART folders ...
run 20x56 48hrs test ...
run 20x56 24hrs test ...
link restart files ...
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT/INPUT /gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
/gpfs/f6/ira-cefi/scratch/Utheri.Wagura/CEFI-regional-MOM6/exps/NEP10.COBALT
run 20x56 24hrs rst test ...
compare MOM.res*.nc
Error: MOM.res*.nc is not identical, please check! Exiting now...
yichengt900 commented 1 month ago

Thanks, @uwagura. Hmm, that's interesting because the same case just passed using the GHA workflow. I’m wondering if it might be a permissions issue—since the restart folders are soft-linked to a folder I created under project-shared. Could you check the log files (out1, out2, out3, and err1, err2, err3) in the same folder and see if there are any further error messages.

uwagura commented 1 month ago

@yichengt900 Yeah, It looks like permissions are part of the problem. From my the end of my out1 file:

FATAL from PE     0: NETCDF ERROR: Permission denied File=RESTART/MOM.res.nc

Not sure if it's relevant or if it's just an offshoot of this problem, but my job error file only contains the following line:

024-10-24 14:39:07.727445 -0400 ERROR /tmp/Sauesw.User/spack-stage/spack-stage-nccmp-1.9.1.0-3j7eylkx5uoiqoxr5kewcghorprw5mfy/spack-src/src/nccmp_state.c:163 File not found: ./RESTART_24hrs_rst/MOM.res*.nc
yichengt900 commented 1 month ago

@uwagura, thanks! In this case, would you mind trying sbatch run.sub within the same folder? I just want to ensure that anyone who wants to try NEP10 can run this example successfully on C6.

uwagura commented 1 month ago

@uwagura, thanks! In this case, would you mind trying sbatch run.sub within the same folder? I just want to ensure that anyone who wants to try NEP10 can run this example successfully on C6.

@yichengt900 , it looks like this example ran successfully

yichengt900 commented 1 month ago

@uwagura, thanks! In this case, would you mind trying sbatch run.sub within the same folder? I just want to ensure that anyone who wants to try NEP10 can run this example successfully on C6.

@yichengt900 , it looks like this example ran successfully

Thanks, @uwagura! It would be great if you could also double-check for any typos in this README and approve the PR when you get a chance. Thanks again!

yichengt900 commented 1 month ago

Hi @uwagura, have you had a chance to look over this README? If there aren’t any obvious typos or mistakes, it would be great if you could approve this PR when you get a moment. Thanks!