CESR-lab / ucla-roms

GNU General Public License v3.0
3 stars 8 forks source link

Unable to produce exact restart #4

Open dafyddstephenson opened 4 months ago

dafyddstephenson commented 4 months ago

Me again 🙃

I am running Rivers_real from this repository on Expanse with the following settings in ocean_vars.opt:

      logical,parameter :: wrt_file_rst      = .true.     ! t/f to write module history file
      real,parameter    :: output_period_rst = 80          ! output period in seconds
      integer,parameter :: nrpf_rst          = 2          ! total recs per file

and with the do_roms_expanse.sh script modified as follows:

module purge
module load slurm
module load cpu/0.15.4  intel/19.1.1.217  mvapich2/2.3.4
module load netcdf-c/4.7.4
module load netcdf-fortran/4.5.3

if [ ! -d RST ];then mkdir RST;fi

## Run 0 begins from initial condition file                                                                                                                                                                                                                                                        
srun --mpi=pmi2 -n 6 roms rivers.in

for x in rivers_???.*.0.nc;do ncjoin ${x/.0.nc}.?.nc;done
cp rivers_rst.??????????????.?.nc RST/
rm rivers_???.??????????????.?.nc

## Run 1 begins from restart at timestep 1-2                                                                                                                                                                                                                                                       
srun --mpi=pmi2 -n 6 roms rivers.in_restart1

for x in rivers1_???.*.0.nc;do ncjoin ${x/.0.nc}.?.nc;done
cp rivers1_rst.??????????????.?.nc RST/
rm rivers1_???.??????????????.?.nc

## Use nco to difference the two files                                                                                                                                                                                                                                                             
module load cpu/0.15.4;module load gcc/10.2.0;module load openmpi/4.0.4;module load nco

ncdiff rivers1_rst.20121209133955.nc rivers_rst.20121209133955.nc rst_diff.20121209133955.nc

where rivers.in_restart1 is different from rivers.in as follows:

initial: NRREC  filename
          2
     RST/rivers_rst.20121209133555.nc
output_root_name:
     rivers1

... This produces the file rst_diff.20121209133955.nc which should, I believe, be everywhere 0, but is not.

nmolem commented 3 months ago

I've been working on this. One of the issues is that the mixing coefficients that are computed in lmd_kpp are time-averaged over 2 time-steps. The coefficients are not stored in the restart file and hence, perfect restarting is broken. When I turn off the lmd_kpp and lmd_bkpp flags in cppdevs.opt, the restart appears to be good. I've added a new example to test that. It's currently on the new branch: 'compliant'.

matt-long commented 3 months ago

@nmolem, when you say the restart appears to be good, I think you are saying that we're done to round-off level differences, right?

You mentioned the idea today that there is perhaps an order of operations difference between a restart and time step computations.

Can we test this hypothesis by instrumenting the code in some way or is it just by inspection?

matt-long commented 3 months ago

@nmolem, do you plan to submit a PR to merge compliant back to main?

See #11 related to defining branches and use patterns.

TomNicholas commented 3 months ago

I would like to see test of the expected behaviour / solution to the restarts issue encoded in a regression test that’s run in the CI automatically (see #6). And if it can’t be reproduced there then it must be Expanse-specific.