Closed fernandadialzira closed 2 months ago
@fernandadialzira sorry for your troubles and thanks for the feedback. documentation was more for the master branch, few features of development branch are not documented yet, this helps what you report.
a quick comments:
For the run part, not having libmkl_intel_lp64.so.2 is related to slurm not finding /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone/fesom2/env/levante.dkrz.de/shell.intel
this is puzzling as this file is also needed and used for compilation. Did you compile the model directly in your fesom2 directory using ./configure.sh, in that case did the configure script find the same shell file or did you compile the model using esmtools?
slurm logs usually come by default in the directory where you submit the job. are if you want then in results dir, easy trick would be copy slurm batch script into results dir.
Hi @suvarchal!
I compiled the model directly on the fesom2 directory using bash -l ./configure.sh
, as in the documentation, and it worked fine once I was in the refactoring
branch. Do you also have any clue to the other questions in the issue?
Thank you for the advice on the slurm logs!
Hi @fernandadialzira & @mandresm
... so you us ESM-tools right? There is a typo mistake in the pathname of /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone/fesom2/env/levante.dkrz.de/shel.intel your real path to that file seems to be /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom-
standalone/fesom2/env/levante.dkrz.de/shell.intel. Thats while he cant find it. I guess the problem is in that case somewhere in ESM-tools when it build its directory tree!
I don't think she is using ESM-Tools, otherwise she wouldn't be using the job_levante
script
Yes, I am not using esm-tools. But it is good that I am stuck based on a typo. I am going to continue by fixing it and let you know if there is any other errors
looks like you download FESOM with esm-tools. For "clear experiment" it would be probably better to clone it directly from repo?
looks like you download FESOM with esm-tools. For "clear experiment" it would be probably better to clone it directly from repo?
Hi, I did try with esm-tools before, but inside /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone/fesom2
I was trying the clear experiment, cloning from the repo and following the instructions.
As I wanted to fix the typo shown by @patrickscholz, I deleted the repo and tried to clone and compile the model again. The compilation now did not work with the message:
ld: cannot find -lFALSE
make[2]: *** [src/CMakeFiles/fesom.dir/build.make:1642: src/fesom] Error 1
make[2]: Leaving directory '/work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom2/build'
make[1]: *** [CMakeFiles/Makefile2:140: src/CMakeFiles/fesom.dir/all] Error 2
make[1]: Leaving directory '/work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom2/build'
make: *** [Makefile:139: all] Error 2
In this way, fesom.x
is not created and I can't try to run the model again. My only idea is that simply deleting the folder does not do the trick, is there a better way to uninstall the model and try clean again?
@fernandadialzira sorry for the late response.
can you please try ./configure.sh -DBLA_VENDOR=Intel10_64lp
. (my suspicion is it is hard/different to discover blas from newer versions of imkl then what used to be)
@suvarchal thank you for your comment!
It worked, but only with a fully clean installation, so now, my fesom standalone directory is /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone2/fesom2
. With that, I was able to run 10 model years, and the output looks reasonable for a non-equilibrated run (in /work/ab0246/a270179/runtime/awicm3-v3.1/experiments_testing/mesh_sln_003
)
Mid-pliocene SST (new mesh)
Pre-Industrial SST (280 ppmv)
However, to get into this, I had to make some changes, and this is perhaps the contribution to the documentation:
If one wants to simulate more than 1 year in one go, one needs to set the #SBATCH --time=00:30:00
to a higher value in fesom2/work/job_levante
. @JanStreffing has taught me that I could uncomment the last lines and the job would resubmit itself, but we did not know how to set an end date for the simulation. Therefore, for 10 years, I set the time to #SBATCH --time=04:30:00
I had to add to job_levante
the copying of namelist.tra
, namelist.io
, namelist.dyn
and namelist.cvmix
:
cp -n ../config/namelist.tra .
cp -n ../config/namelist.io .
cp -n ../config/namelist.cvmix .
cp -n ../config/namelist.dyn .
Otherwise I would get errors like:
namelist.tra
and namelist.io
in slurm-err.out
:
forrtl: severe (24): end-of-file during read, unit -131, file /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone2/fesom2/work/namelist.tra
fesom2-0.out
:
end level area_test
0: ERROR: --> bad opening file : namelist.dyn ; iostat= 29
0: fesom should stop with exit status = 0
namelist.forcing
. At the beginning, based on documentation, I have only changed namelist.config
to:×tep
step_per_day=32 !96 !96 !72 !72 !45 !72 !96
run_length=10 !62 !62 !62 !28
run_length_unit='y' ! y, m, d, s
/
&clockinit ! the model starts at
timenew=0.0
daynew=1
yearnew=1990
/
&paths
MeshPath='/work/ab0246/a270179/runtime/awicm3-v3.1/input/fesom2/midpli/'
ClimateDataPath='/pool/data/AWICM/FESOM2/INITIAL/phc3.0/'
ResultPath='/work/ab0246/a270179/runtime/awicm3-v3.1/experiments_testing/mesh_sln_003/'
/
For namelist.forcing
, I did:
&nam_sbc
nm_xwind_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/uas.' ! name of file with wind speeds x
nm_ywind_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/vas.' ! name of file with wind speeds y
nm_xstre_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/uas.' ! name of file with wind stress x
nm_ystre_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/vas.' ! name of file with wind stress y
nm_humi_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/huss.' ! name of file with humidity
nm_qsr_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/rsds.' ! name of file with solar heat
nm_qlw_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/rlds.' ! name of file with Long wave
nm_tair_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/tas.' ! name of file with 2m air temperature
nm_prec_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/prra.' ! name of file with total precipitation
nm_snow_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/prsn.' ! name of file with snow precipitation
nm_mslp_file = '/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/psl.' ! air_pressure_at_sea_level
nm_xwind_var = 'uas' ! name of variable in file with wind
nm_ywind_var = 'vas' ! name of variable in file with wind
nm_xstre_var = 'uas' ! name of variable in file with wind
nm_ystre_var = 'vas' ! name of variable in file with wind
nm_humi_var = 'huss' ! name of variable in file with humidity
nm_qsr_var = 'rsds' ! name of variable in file with solar heat
nm_qlw_var = 'rlds' ! name of variable in file with Long wave
nm_tair_var = 'tas' ! name of variable in file with 2m air temperature
nm_prec_var = 'prra' ! name of variable in file with total precipitation
nm_snow_var = 'prsn' ! name of variable in file with total precipitation
nm_mslp_var = 'psl' ! name of variable in file with air_pressure_at_sea_level
nm_nc_iyear = 1900
nm_nc_imm = 1 ! initial month of time axis in netCDF
nm_nc_idd = 1 ! initial day of time axis in netCDF
nm_nc_freq = 1 ! data points per day (i.e. 86400 if the time axis is in seconds)
nm_nc_tmid = 0 ! 1 if the time stamps are given at the mid points of the netcdf file, 0 otherwise (i.e. 1 in CORE1, CORE2; 0 in JRA55)
l_xwind=.true. l_ywind=.true. l_xstre=.false. l_ystre=.false. l_humi=.true. l_qsr=.true. l_qlw=.true. l_tair=.true. l_prec=.true. l_mslp=.false. l_cloud=.false. l_snow=.true.
runoff_data_source ='CORE2' !Dai09, CORE2
nm_runoff_file ='/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/CORE2_runoff.nc'
!nm_runoff_file ='/work/ollie/qwang/FESOM2_input/mesh/CORE2_finaltopo_mean/forcing_data_on_grid/runoff_clim.nc'
!runoff_data_source ='Dai09' !Dai09, CORE2, JRA55
!runoff_climatology =.true.
sss_data_source ='CORE2'
nm_sss_data_file ='/pool/data/AWICM/FESOM2/FORCING/JRA55-do-v1.4.0/PHC2_salx.nc'
chl_data_source ='None' !'Sweeney' monthly chlorophyll climatology or 'NONE' for constant chl_const (below). Make use_sw_pene=.TRUE. in namelist.config!
nm_chl_data_file ='/pool/data/AWICM/FESOM2/FORCING/Sweeney/Sweeney_2005.nc'
chl_const = 0.1
/
Of course, these paths are related to levante. One needs to go to fesom2/setups/paths.yml
to find paths for other machines.
I believe that this information should be clearly stated in documentation, and that job_levante
needs to include the copying of these other namelists.
Hi @fernandadialzira . Thanks a lot for sharing the experience with us, and your suggestions. Why don't you give a shot on improving the docs yourself - this is usually best done by people who have fresh experience. I added you to the repo, so you just have to make a branch from refactoring
and edit the docs. Most of the things you mention should probably go to:
https://github.com/FESOM/fesom2/blob/refactoring/docs/getting_started/getting_started.rst
If you think you have time and willingness to do it, please give it a try, make PR. I will be happy to help you with that.
Hi @koldunovn! I will try it on Monday.
I think it is a good idea, also so that I can practice those things.
I need to do a fesom standalone simulation for a new mesh representing the mid-Pliocene ocean conditions. This issue is to update on errors encountered while following the documentation, as discussed with @koldunovn, @pgierz, @patrickscholz, and Sesh (I don't know his username).
Machine:
levante
Branch:refactoring
Mesh:/work/ab0246/a270179/runtime/awicm3-v3.1/input/fesom2/midpli/
Model directory:/work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom-standalone/fesom2/
First step: build model executable
master
torefactoring
branch, otherwise it does not update the available machines for submitting the job.After doing this, the mesh partitioning was easy to perform following documentation.
Second step: running the model
core2
ormidpli
?/work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom-standalone/fesom2/results/
for results and created afesom.clock
there using the start year of 1990 instead of 1958. Is that ok?namelist.config
:&clockinit ! the model starts at timenew=0.0 daynew=1 yearnew=1990 /
&paths MeshPath='/work/ab0246/a270179/runtime/awicm3-v3.1/input/fesom2/midpli/' ClimateDataPath='/pool/data/AWICM/FESOM2/INITIAL/phc3.0/' ResultPath='/work/ab0246/a270179/runtime/awicm3-v3.1/experiments_testing/fesom-standalone/' /
SBATCH --job-name=midpli_test1
SBATCH -p compute
SBATCH --ntasks-per-node=108
SBATCH --ntasks=432
SBATCH --time=00:30:00
SBATCH -o slurm-out.out
SBATCH -e slurm-err.out
SBATCH -A ab0246
/var/spool/slurmd/job6868133/slurm_script: line 14: /work/ab0246/a270179/runtime/awicm3-v3.1/model_codes/fesom_standalone/fesom2/env/levante.dkrz.de/shell.intel: No such file or directory 217: fesom.x: error while loading shared libraries: libmkl_intel_lp64.so.2: cannot open shared object file: No such file or directory