Closed jpolton closed 3 years ago
Chris passed on insights from Adam. To replace the modules in make_xios.sh
make_nemo.sh
and the submit.slurm
scripts
from
module -s restore /work/n01/shared/acc/n01_modules/ucx_env
to
module load cpe/21.03
module load cray-hdf5-parallel
module load cray-netcdf-hdf5parallel
export LD_LIBRARY_PATH=$CRAY_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
In the slurm script this goes above the OMP_NUM_THREADS=1
Rebuilding nemo.exe
and xios_server.exe
as above on branch feature/new_module
made no difference to the hanging
Try investigating the allocation of nodes and all that. E.g. from Chris:
/work/n01/n01/cwi/mkslurm_hetjob -S 8 -s 16 -m 2 -C 831 -g 16 -N 128 -t 00:10:00 -a n01-CLASS -j SE-NEMO > runscript_831.slurm
The number of cores and gaps are the main things to vary here. 831 and 16 are a sweet spot for eORCA025 but other options may be better for a different NEMO configuration. Scroll to the bottom of https://docs.archer2.ac.uk/research-software/nemo/nemo.html for info.
@mpayopayo @micdom IT IS RUNNING!! On a new branch https://github.com/JMMP-Group/SEVERN-SWOT/tree/feature/new_modules I tried a couple of things:
Together these got NEMO running and outputting again (though "1." above may not be necessary).
It didn't quite complete - possible an issue with the domain but this is progress.
As a minimum effort to test if it was only the new slurm script that was needed copy (https://github.com/JMMP-Group/SEVERN-SWOT/blob/feature/new_modules/RUN_DIRECTORIES/EXP_unforced/submit.slurm) and swap the jelt
line 28. And swap the modules back (line 16 instead of 17-20). And n01-ACCORD
in line 6.
Unforrtunately I have rebuilt my NEMO and XIOS executables using the new modules so haven't tested whether they were important or not.
@jpolton @mpayopayo I'm a bit lost. I'm trying to run the unforced run without the boundary file (which I didn't manage to build). I got an output, not sure what I got though... but I didn't change the submit.slurm script...
@jpolton @mpayopayo I'm a bit lost. I'm trying to run the unforced run without the boundary file (which I didn't manage to build). I got an output, not sure what I got though... but I didn't change the submit.slurm script...
@micdom Can you do a chmod a+rx -R /work/n01/n01/micdom
@jpolton @micdom OK I'll try now just with the new submit slurm
@jpolton @mpayopayo done chmod a+rx -R /work/n01/n01/micdom
@jpolton maybe silly, but I'm not at ease yet with git, do I have to do the test in a new branch?
@jpolton @mpayopayo I'm a bit lost. I'm trying to run the unforced run without the boundary file (which I didn't manage to build). I got an output, not sure what I got though... but I didn't change the submit.slurm script...
Looks like @micdom is the winner so far. Even got RESTART files written!! The run log is ocean.output
. The XIOS output (defined in field_def_nemo-oce.xml
) is SEVERN_unforced_1d_t.nc
Well done
@jpolton maybe silly, but I'm not at ease yet with git, do I have to do the test in a new branch?
You could copy off the web page and paste it into your file.
@micdom
Elevations ~1e-12 m after 288 steps without forcing. Good job.
Riding high on this success, I'm calling it quits for the week before something goes wrong!
@jpolton I'm getting segmentation fault, maybe because different modules compiling and running? I'm running with the bathy that misses the SW bit. If @micdom is running with the "full" bathy, and I did not have problems with your bathy Could it all come from the bathy and the domain?
I'll try next week generating the bathy again.
@jpolton @mpayopayo I'm using a different bathy with the SW bit!
dout.variables['elevation'][0:99,:] = 0
dout.variables['elevation'][0:200,650::] = 0
for the rest I've followed the instructions, made a last pull this afternoon, and just changed in the namelist_cfg
ln_bdy=.false
. and nn_itend= 288
.
have a nice weekend!
@jpolton @mpayopayo I have not updated the wiki for the unforced run, but maybe I should.
The unforced run can be done without creating the boundary file first.
It is sufficient to change ln_bdy=.false
in the namelist_cfg
.
The section of the wiki Run Unforced can go before Make tidal boundary conditions.
@jpolton, @micdom I'm redoing again the bathy and the run unforced, I happy to modify the wiki afterwards
@jpolton it hangs/gives segmentation fault with the crop bathymetry but not with the full bathymetry. So I think that is were the problem is.
Model seems to run but hangs without terminating properly