Run unforced - Githubissues

jpolton commented 3 years ago

Does it timestep without forcing? Not a bad test to make sure ARCHER2 accounts are set up correctly.

Need coordinates.bdy.nc using PyNEMO. Then this runs https://github.com/JMMP-Group/SEVERN-SWOT/blob/master/SCRIPTS/run_unforced.sh

jpolton commented 3 years ago

This script submits this test: https://github.com/JMMP-Group/SEVERN-SWOT/blob/master/SCRIPTS/run_unforced.sh

jpolton commented 3 years ago

Looks like I need to create boundary coordinate before this will work.

  ===>>> : E R R O R

          ===========

 ctypebdy must be N, S, E or W

Alternatively running with ln_bdy=F produces ssh>20m issue.

jpolton commented 3 years ago

Worked with two fixes:

coordinates.bdy.nc (from PyNEMO)
adjusting the viscosity to values more appropriate to a 500m grid.

(PyNEMO instructions not yet ready)

jpolton commented 3 years ago

It worked. Over to @mpayopayo for testing/fixing/improving

jpolton commented 3 years ago

Screen_Capture_-_27_May__1_44_pm

50 timesteps (20s). Unforced. SSH ~ 1E-12m. This seems good.

jpolton commented 3 years ago

NB there is a weird southern boundary bit which should be land and nan (white) but is zeros (green) instead.

mpayopayo commented 3 years ago

@jpolton I'm getting |U| max 10.90 at i j k 22 100 30 MPI rank 27 Things I've modified on the namelist_cfg are ln_bdy = .false. ln_zinterp = .false. ln_full_vel = .false. !cn_dir = '/wo* ! filtide =* rn_avm0 = 1.2e-4 rn_avt0 = 1.2e-5

would you have any suggestion?

jpolton commented 3 years ago

@jpolton I'm getting |U| max 10.90 at i j k 22 100 30 MPI rank 27 Things I've modified on the namelist_cfg are ln_bdy = .false. ln_zinterp = .false. ln_full_vel = .false. !cn_dir = '/wo* ! filtide =* rn_avm0 = 1.2e-4 rn_avt0 = 1.2e-5

would you have any suggestion?

Well there appear to be a number of differences between your namelist_cfg and mine:

diff /work/n01/n01/jelt/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced/namelist_cfg /work/n01/n01/marpay/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced/namelist_cfg
76c76
<    rn_rdt      =  20.     !  time step for the dynamics and tracer
---
>    rn_rdt      =  60.     !  time step for the dynamics and tracer
276c276
<     ln_bdy         = .true.   !  Use unstructured open boundaries
---
>     ln_bdy         = .false.   !  Use unstructured open boundaries MPP false
279c279
<     ln_coords_file = .true.                                 !  =T : read bdy coordinates from file
---
>     ln_coords_file = .false.                                  !  =T : read bdy coordinates from file
324c324
<    ln_zinterp  = .true.       !  T if a vertical interpolation is required. Variables gdep[tuv] and e3[tuv] must exist in the file
---
>    ln_zinterp  = .false.       !  T if a vertical interpolation is required. Variables gdep[tuv] and e3[tuv] must exist in the file MPP changed
326c326
<    ln_full_vel = .true.       !  T if [uv]3d are "full" velocities and not only its baroclinic components
---
>    ln_full_vel = .false.       !  T if [uv]3d are "full" velocities and not only its baroclinic components MPP changed
328c328
<    cn_dir  =    './OBC/'
---
>    !cn_dir  =    '/work/n01/n01/annkat/SEAsia_R36_R/BOUNDARY_FORCING/2000/' MPP comment
343c343
<    filtide      = 'TIDES/SEAsia_HAD_bdytide_rotT_'      !  file name root of tidal forcing files
---
> !   filtide      = 'TIDES/SEAsia_HAD_bdytide_rotT_'      !  file name root of tidal forcing files MPP comment
486,487c486,487
<    rn_Uv      = 0.01 !0.02    !  lateral viscous velocity [m/s]
<    rn_Lv      = 200      !  lateral viscous length   [m]
---
>    rn_Uv      = 0.04 !0.02    !  lateral viscous velocity [m/s]
>    rn_Lv      = 5.e+3      !  lateral viscous length   [m]
514,515c514,515
<    rn_avm0     =  1.2e-6      !  vertical eddy viscosity   [m2/s]       (background Kz if ln_zdfcst=F)
<    rn_avt0     =  1.2e-6      !  vertical eddy diffusivity [m2/s]       (background Kz if ln_zdfcst=F)
---
>    rn_avm0     =  1.2e-4 !6MPP      !  vertical eddy viscosity   [m2/s]       (background Kz if ln_zdfcst=F)
>    rn_avt0     =  1.2e-5 !6MPP      !  vertical eddy diffusivity [m2/s]       (background Kz if ln_zdfcst=F)

Mine is up to date with the repo so you might want to do a: git pull to get any changes you are missing (you can learn a lot about git with Youtube, that's what I did).

I would expect/hope that it should work fine if we have the same namelist_cfg files....

jpolton commented 3 years ago

(BTW I suspect that it is the lateral viscosity and time step differences that are killing your run)

mpayopayo commented 3 years ago

@jpolton I did git pull so I'm using the same files. It doesn't break but I don't get any output file. is that because too short run time on the submit.slurm? there's no difference between yours and mine though

jpolton commented 3 years ago

You set the number of timestep to run in the namelist_cfg. You set the wall time (allowable compute time in the submit.slurm). You set the output frequency in the file_def*xml. Some combination of tuning these will give you output. You can check run.stat to get an summary of how the model is behaving. The aim of this run is to get something that timestep and is stable. You now have that. Output will not be interesting. I recommend you move onto the tide forced run.

jpolton commented 3 years ago

@mpayopayo You are correct. Your unforced run did not work. I've just peeked inside /work/n01/n01/marpay/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced

run.stat is empty. This usefully gives useful numbers about max velocity and the like.

In ocean.output there is no evidence of time stepping. Usually you see "kt = .. " when the time step increments. You don't have any of that.

Looking at submit.slurm it says you are trying to run on ACCORD budget. That might be a problem. I'd fix that and then see what happens.

mpayopayo commented 3 years ago

@jpolton just ran with CLASS on the budget and getting the same.

-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 timing.output
-rw-r--r-- 1 marpay n01   77051 Jul  9 09:32 layout.dat
-rw-r--r-- 1 marpay n01  307361 Jul  9 09:32 ocean.output
-rw-r--r-- 1 marpay n01      10 Jul  9 09:32 time.step
-rw-r--r-- 1 marpay n01     292 Jul  9 09:32 run.stat.nc
-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 run.stat
-rw-r--r-- 1 marpay n01  913347 Jul  9 09:32 output.namelist.dyn
-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 communication_report.txt
-rw-r--r-- 1 marpay n01  150963 Jul  9 09:42 slurm-381082.out

jpolton commented 3 years ago

@jpolton just ran with CLASS on the budget and getting the same.

-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 timing.output
-rw-r--r-- 1 marpay n01   77051 Jul  9 09:32 layout.dat
-rw-r--r-- 1 marpay n01  307361 Jul  9 09:32 ocean.output
-rw-r--r-- 1 marpay n01      10 Jul  9 09:32 time.step
-rw-r--r-- 1 marpay n01     292 Jul  9 09:32 run.stat.nc
-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 run.stat
-rw-r--r-- 1 marpay n01  913347 Jul  9 09:32 output.namelist.dyn
-rw-r--r-- 1 marpay n01       0 Jul  9 09:32 communication_report.txt
-rw-r--r-- 1 marpay n01  150963 Jul  9 09:42 slurm-381082.out

The 'good' news is that I can replicate this problem. I will see what I can now do to fix it.

mpayopayo commented 3 years ago

@jpolton, just in case, I'm doing this with the bathymetry that is missing the southwest patch of sea

jpolton commented 3 years ago

@mpayopayo @micdom The issue appears to be with I/O (perhaps XIOS). The clue is that there is no error (XIOS is like that...) I rebuilt everything from scratch and got the same effects with EXP_unforced that @mpayopayo had.

I commented out the every-time-step output in file_def_nemo-oce.xml. This allowed the run to go to the end of the specified number of time steps, but then it hangs around like a rogue process. Some output goes in the cfl* file but they should be more in ocean.output. Hmm. I will have a think about what might be going wrong.

Also, I've updated the build domain wiki thing a bit more with an ARCHER2 method, to make things slicker.

mpayopayo commented 3 years ago

@jpolton could it be something to do with the environments? C. Wilson mentioned that there was new advice from Adam Blake on the modules to load to compile? If came between when you set up the model and when I set up the model, maybe that could be the cause? This is from the MSML teams chat 28 & 29 th June:


module load cpe/21.03
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel

export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH```

jpolton commented 3 years ago

@jpolton could it be something to do with the environments? C. Wilson mentioned that there was new advice from Adam Blake on the modules to load to compile? If came between when you set up the model and when I set up the model, maybe that could be the cause? This is from the MSML teams chat 28 & 29 th June:
module load cpe/21.03
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel

export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH```

I will try it now

jpolton commented 3 years ago

@mpayopayo The model hanging is definitely a different issue. Here is a new ticket https://github.com/JMMP-Group/SEVERN-SWOT/issues/19

mpayopayo commented 3 years ago

@jpolton, i've been trying to rerun the unforced model and the tides only this time with the full bathymetry i.e. including the SW bit as done by @micdom. I've got segmentation faults, and jobs pending because nodes not available/idle on archer. It seems more like a problem with archer than the runs itself.

jpolton commented 3 years ago

@jpolton, i've been trying to rerun the unforced model and the tides only this time with the full bathymetry i.e. including the SW bit as done by @micdom. I've got segmentation faults, and jobs pending because nodes not available/idle on archer. It seems more like a problem with archer than the runs itself.

@mpayopayo Did you try the new queue decomposition method / new slurm script mentioned above? (https://github.com/JMMP-Group/SEVERN-SWOT/issues/19) I can't even log onto ARCHER2 at the moment!!

mpayopayo commented 3 years ago

@jpolton tried and failed, so I was sticking to the "old" method. I can log in as in it takes my password and passphrase but it stays there.

jpolton commented 3 years ago

@mpayopayo Hmm well I considered not being able to log in properly as an indication of issues beyond my control. In these situations if things work then great, but if things don't work it is hard to identify the root cause.

mpayopayo commented 3 years ago

@jpolton @micdom and it runs!!! I'm very confident that the problem was the bathymetry, with the full bathymetry runs without problem. I'll change the wiki accordingly

JMMP-Group / SEVERN-SWOT

Run unforced #13