Closed jpolton closed 3 years ago
This script submits this test:
https://github.com/JMMP-Group/SEVERN-SWOT/blob/master/SCRIPTS/run_unforced.sh
Looks like I need to create boundary coordinate before this will work.
===>>> : E R R O R
===========
ctypebdy must be N, S, E or W
Alternatively running with ln_bdy=F
produces ssh>20m issue.
Worked with two fixes:
(PyNEMO instructions not yet ready)
It worked. Over to @mpayopayo for testing/fixing/improving
50 timesteps (20s). Unforced. SSH ~ 1E-12m. This seems good.
NB there is a weird southern boundary bit which should be land and nan (white) but is zeros (green) instead.
@jpolton I'm getting |U| max 10.90 at i j k 22 100 30 MPI rank 27
Things I've modified on the namelist_cfg
are
ln_bdy = .false.
ln_zinterp = .false.
ln_full_vel = .false.
!cn_dir = '/wo*
! filtide =*
rn_avm0 = 1.2e-4
rn_avt0 = 1.2e-5
would you have any suggestion?
@jpolton I'm getting
|U| max 10.90 at i j k 22 100 30 MPI rank 27
Things I've modified on thenamelist_cfg
areln_bdy = .false.
ln_zinterp = .false.
ln_full_vel = .false.
!cn_dir = '/wo*
! filtide =*
rn_avm0 = 1.2e-4
rn_avt0 = 1.2e-5
would you have any suggestion?
Well there appear to be a number of differences between your namelist_cfg and mine:
diff /work/n01/n01/jelt/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced/namelist_cfg /work/n01/n01/marpay/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced/namelist_cfg
76c76
< rn_rdt = 20. ! time step for the dynamics and tracer
---
> rn_rdt = 60. ! time step for the dynamics and tracer
276c276
< ln_bdy = .true. ! Use unstructured open boundaries
---
> ln_bdy = .false. ! Use unstructured open boundaries MPP false
279c279
< ln_coords_file = .true. ! =T : read bdy coordinates from file
---
> ln_coords_file = .false. ! =T : read bdy coordinates from file
324c324
< ln_zinterp = .true. ! T if a vertical interpolation is required. Variables gdep[tuv] and e3[tuv] must exist in the file
---
> ln_zinterp = .false. ! T if a vertical interpolation is required. Variables gdep[tuv] and e3[tuv] must exist in the file MPP changed
326c326
< ln_full_vel = .true. ! T if [uv]3d are "full" velocities and not only its baroclinic components
---
> ln_full_vel = .false. ! T if [uv]3d are "full" velocities and not only its baroclinic components MPP changed
328c328
< cn_dir = './OBC/'
---
> !cn_dir = '/work/n01/n01/annkat/SEAsia_R36_R/BOUNDARY_FORCING/2000/' MPP comment
343c343
< filtide = 'TIDES/SEAsia_HAD_bdytide_rotT_' ! file name root of tidal forcing files
---
> ! filtide = 'TIDES/SEAsia_HAD_bdytide_rotT_' ! file name root of tidal forcing files MPP comment
486,487c486,487
< rn_Uv = 0.01 !0.02 ! lateral viscous velocity [m/s]
< rn_Lv = 200 ! lateral viscous length [m]
---
> rn_Uv = 0.04 !0.02 ! lateral viscous velocity [m/s]
> rn_Lv = 5.e+3 ! lateral viscous length [m]
514,515c514,515
< rn_avm0 = 1.2e-6 ! vertical eddy viscosity [m2/s] (background Kz if ln_zdfcst=F)
< rn_avt0 = 1.2e-6 ! vertical eddy diffusivity [m2/s] (background Kz if ln_zdfcst=F)
---
> rn_avm0 = 1.2e-4 !6MPP ! vertical eddy viscosity [m2/s] (background Kz if ln_zdfcst=F)
> rn_avt0 = 1.2e-5 !6MPP ! vertical eddy diffusivity [m2/s] (background Kz if ln_zdfcst=F)
Mine is up to date with the repo so you might want to do a:
git pull
to get any changes you are missing (you can learn a lot about git with Youtube, that's what I did).
I would expect/hope that it should work fine if we have the same namelist_cfg
files....
(BTW I suspect that it is the lateral viscosity and time step differences that are killing your run)
@jpolton I did git pull
so I'm using the same files. It doesn't break but I don't get any output file. is that because too short run time on the submit.slurm? there's no difference between yours and mine though
You set the number of timestep to run in the namelist_cfg. You set the wall time (allowable compute time in the submit.slurm). You set the output frequency in the file_def*xml. Some combination of tuning these will give you output. You can check run.stat to get an summary of how the model is behaving. The aim of this run is to get something that timestep and is stable. You now have that. Output will not be interesting. I recommend you move onto the tide forced run.
@mpayopayo You are correct. Your unforced run did not work.
I've just peeked inside
/work/n01/n01/marpay/SEVERN-SWOT/RUN_DIRECTORIES/EXP_unforced
run.stat is empty. This usefully gives useful numbers about max velocity and the like.
In ocean.output there is no evidence of time stepping. Usually you see "kt = .. " when the time step increments. You don't have any of that.
Looking at submit.slurm
it says you are trying to run on ACCORD budget. That might be a problem. I'd fix that and then see what happens.
@jpolton just ran with CLASS on the budget and getting the same.
-rw-r--r-- 1 marpay n01 0 Jul 9 09:32 timing.output
-rw-r--r-- 1 marpay n01 77051 Jul 9 09:32 layout.dat
-rw-r--r-- 1 marpay n01 307361 Jul 9 09:32 ocean.output
-rw-r--r-- 1 marpay n01 10 Jul 9 09:32 time.step
-rw-r--r-- 1 marpay n01 292 Jul 9 09:32 run.stat.nc
-rw-r--r-- 1 marpay n01 0 Jul 9 09:32 run.stat
-rw-r--r-- 1 marpay n01 913347 Jul 9 09:32 output.namelist.dyn
-rw-r--r-- 1 marpay n01 0 Jul 9 09:32 communication_report.txt
-rw-r--r-- 1 marpay n01 150963 Jul 9 09:42 slurm-381082.out
@jpolton just ran with CLASS on the budget and getting the same.
-rw-r--r-- 1 marpay n01 0 Jul 9 09:32 timing.output -rw-r--r-- 1 marpay n01 77051 Jul 9 09:32 layout.dat -rw-r--r-- 1 marpay n01 307361 Jul 9 09:32 ocean.output -rw-r--r-- 1 marpay n01 10 Jul 9 09:32 time.step -rw-r--r-- 1 marpay n01 292 Jul 9 09:32 run.stat.nc -rw-r--r-- 1 marpay n01 0 Jul 9 09:32 run.stat -rw-r--r-- 1 marpay n01 913347 Jul 9 09:32 output.namelist.dyn -rw-r--r-- 1 marpay n01 0 Jul 9 09:32 communication_report.txt -rw-r--r-- 1 marpay n01 150963 Jul 9 09:42 slurm-381082.out
The 'good' news is that I can replicate this problem. I will see what I can now do to fix it.
@jpolton, just in case, I'm doing this with the bathymetry that is missing the southwest patch of sea
@mpayopayo @micdom The issue appears to be with I/O (perhaps XIOS). The clue is that there is no error (XIOS is like that...) I rebuilt everything from scratch and got the same effects with EXP_unforced that @mpayopayo had.
I commented out the every-time-step output in file_def_nemo-oce.xml
. This allowed the run to go to the end of the specified number of time steps, but then it hangs around like a rogue process. Some output goes in the cfl* file but they should be more in ocean.output. Hmm. I will have a think about what might be going wrong.
Also, I've updated the build domain wiki thing a bit more with an ARCHER2 method, to make things slicker.
@jpolton could it be something to do with the environments? C. Wilson mentioned that there was new advice from Adam Blake on the modules to load to compile? If came between when you set up the model and when I set up the model, maybe that could be the cause? This is from the MSML teams chat 28 & 29 th June:
module load cpe/21.03
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH```
@jpolton could it be something to do with the environments? C. Wilson mentioned that there was new advice from Adam Blake on the modules to load to compile? If came between when you set up the model and when I set up the model, maybe that could be the cause? This is from the MSML teams chat 28 & 29 th June:
module load cpe/21.03 module load cray-hdf5-parallel module load cray-netcdf-hdf5parallel export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH```
I will try it now
@mpayopayo The model hanging is definitely a different issue. Here is a new ticket https://github.com/JMMP-Group/SEVERN-SWOT/issues/19
@jpolton, i've been trying to rerun the unforced model and the tides only this time with the full bathymetry i.e. including the SW bit as done by @micdom. I've got segmentation faults, and jobs pending because nodes not available/idle on archer. It seems more like a problem with archer than the runs itself.
@jpolton, i've been trying to rerun the unforced model and the tides only this time with the full bathymetry i.e. including the SW bit as done by @micdom. I've got segmentation faults, and jobs pending because nodes not available/idle on archer. It seems more like a problem with archer than the runs itself.
@mpayopayo Did you try the new queue decomposition method / new slurm script mentioned above? (https://github.com/JMMP-Group/SEVERN-SWOT/issues/19) I can't even log onto ARCHER2 at the moment!!
@jpolton tried and failed, so I was sticking to the "old" method. I can log in as in it takes my password and passphrase but it stays there.
@mpayopayo Hmm well I considered not being able to log in properly as an indication of issues beyond my control. In these situations if things work then great, but if things don't work it is hard to identify the root cause.
@jpolton @micdom and it runs!!! I'm very confident that the problem was the bathymetry, with the full bathymetry runs without problem. I'll change the wiki accordingly
Does it timestep without forcing? Not a bad test to make sure ARCHER2 accounts are set up correctly.
Need coordinates.bdy.nc using PyNEMO. Then this runs https://github.com/JMMP-Group/SEVERN-SWOT/blob/master/SCRIPTS/run_unforced.sh