NGEET / fates-containers

Repository for containerized version of fates for use in future tutorials
6 stars 7 forks source link

Enhancement: First pass at end-to-end site-scale containerized HLM-FATES workflow #19

Open serbinsh opened 4 years ago

serbinsh commented 4 years ago

Plan: build off of my older example using local met - https://github.com/serbinsh/ctsm_containers/wiki/Example-CTSM-FATES-(CLM5-FATES)-run:-PA-SLZ-using-NGEE-Tropics-driver-files - but instead provide default GSWP3 drivers extracted for the SLZ site together with single-point (i.e. 1 pixel) surface and other forcing files as a packaged product that can be easily downloaded for end-user experimentation.

Step 1: Using existing example script and container file (docker pull serbinsh/ctsm_containers:ctsm-fates_next_api-fates_sci.1.23.0_api.7.1.0) generate a new build script that uses pre-extracted default single-point forcing files with full-res surf/domain/ndep etc files to identify full set of required inputs. Use a lower resolution grid to start with the I2000Clm50FatesGs compset

Step 2: Update existing python script for extract single-point drivers and surf/domain files to work with all other ancillary inputs. Generate a new cesm input datas folder containing all required inputs, but with 1 pixel

Step 3: Modify example script to run at SLZ using full set of single-point inputs. Test

Step 4: Package draft input datasets as a tar.gz. Upload to OSF. Test pulling down and running locally on different machines using Docker and singularity. Write up example notes

Step 5: Other user beta test of script, container, and driver data

Step 6: Update to run with NGEE versions of HLM-FATES containers

Step 7: Add full example to wiki page.

serbinsh commented 4 years ago

Required datasets:

CTSM, and FATES param files (small enough to auto-download)

lnd_in fsnowaging = '/data//lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc' fsnowoptics = '/data//lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc' stream_fldfilename_urbantv = '/data//lnd/clm2/urbandata/CLM50_tbuildmax_Oleson_2016_0.9x1.25_simyr1849-2106_c160923.nc'

the big one datm.streams.txt.presaero.clim_2000 aerosoldep_WACCM.ensmean_monthly_hist_1849-2015_0.9x1.25_CMIP6_c180926.nc 6.21 GBs

datm.streams.txt.topo.observed topodata_0.9x1.25_USGS_070110_stream_c151201.nc

during run i forgot that it lists these files which is probably the most complete place to look Finished creating component namelists Checking that inputdata is available as part of case submission Loading input file list: 'Buildconf/cpl.input_data_list' Loading input file list: 'Buildconf/datm.input_data_list' Loading input file list: 'Buildconf/mosart.input_data_list' Loading input file list: 'Buildconf/clm.input_data_list' Model clm no file specified for finidat

serbinsh commented 4 years ago

So far file size is looking good. datm, surf, and domain files for SLZ are

9.2M Sep  4 11:18 PA-SLZ.tar.gz

So hopefully adding single pixel versions of the other files wont add much additional space

glemieux commented 4 years ago

@serbinsh do you use github project boards? This looks like we could break it up into individual issues and organize on a board.

serbinsh commented 4 years ago

havent before, but sure we can try that!

serbinsh commented 4 years ago

Interesting. Using the ctsm-fates container listed above with the mapped in machine files from my original ctsm_containers github and then running with

./xmlchange NTASKS_CPL=1,ROOTPE_CPL=1,NTHRDS_CPL=1
./xmlchange NTASKS_LND=1,ROOTPE_LND=3,NTHRDS_LND=1
./xmlchange NTASKS_OCN=1,ROOTPE_OCN=1,NTHRDS_OCN=1
./xmlchange NTASKS_ICE=1,ROOTPE_ICE=1,NTHRDS_ICE=1
./xmlchange NTASKS_GLC=1,ROOTPE_GLC=1,NTHRDS_GLC=1
./xmlchange NTASKS_ROF=1,ROOTPE_ROF=1,NTHRDS_ROF=1
./xmlchange NTASKS_WAV=1,ROOTPE_WAV=1,NTHRDS_WAV=1
./xmlchange NTASKS_ESP=1,ROOTPE_ESP=1,NTHRDS_ESP=1

Is running with -np and -npernode 4, and according to my CPU utilization Docker hyperkit is running with ~300% utilization (i.e. 3 cores for lnd model). Screen Shot 2020-09-04 at 12 33 24 PM

Check case OK
submit_jobs case.run
Submit job case.run
Starting job script case.run
Generating namelists for /ctsm_output/CLM5-FATES_1599235597_PA-SLZ
Creating component namelists
  2020-09-04 16:32:03 atm
   Calling /ctsm/cime/src/components/data_comps/datm/cime_config/buildnml
  2020-09-04 16:32:04 lnd
   Calling /ctsm/cime_config/buildnml
WARNING: CLM is starting up from a cold state
  2020-09-04 16:32:04 ice
   Calling /ctsm/cime/src/components/stub_comps/sice/cime_config/buildnml
  2020-09-04 16:32:04 ocn
   Calling /ctsm/cime/src/components/stub_comps/socn/cime_config/buildnml
  2020-09-04 16:32:04 rof
   Calling /ctsm/components/mosart//cime_config/buildnml
  2020-09-04 16:32:05 glc
   Calling /ctsm/cime/src/components/stub_comps/sglc/cime_config/buildnml
  2020-09-04 16:32:05 wav
   Calling /ctsm/cime/src/components/stub_comps/swav/cime_config/buildnml
  2020-09-04 16:32:05 esp
   Calling /ctsm/cime/src/components/stub_comps/sesp/cime_config/buildnml
  2020-09-04 16:32:05 cpl
   Calling /ctsm/cime/src/drivers/mct/cime_config/buildnml
Finished creating component namelists
-------------------------------------------------------------------------
 - Prestage required restarts into /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run
 - Case input data directory (DIN_LOC_ROOT) is /data
 - Checking for required input datasets in DIN_LOC_ROOT
-------------------------------------------------------------------------
run command is mpirun -np 4 -npernode 4 /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/bld/cesm.exe  >> cesm.log.$LID 2>&1

This is using single-point extracted GSWP3 and associated domain file, plus

fsurdat = /data/single_point/PA-SLZ/surfdata_0.9x1.25_16pfts_Irrig_CMIP6_simyr2000_c170824_PA-SLZ.nc
fatmlndfrc = /data/single_point/PA-SLZ/domain.lnd.fv0.9x1.25_gx1v6.090309_PA-SLZ.nc
fsnowaging = /data/lnd/clm2/snicardata/snicar_drdt_bst_fit_60_c070416.nc
finidat =  
fates_paramfile = /data/lnd/clm2/paramdata/fates_params_default_2trop.c190205.nc
fsnowoptics = /data/lnd/clm2/snicardata/snicar_optics_5bnd_c090915.nc
paramfile = /data/lnd/clm2/paramdata/clm5_params.c171117.nc
stream_fldfilename_urbantv = /data/lnd/clm2/urbandata/CLM50_tbuildmax_Oleson_2016_0.9x1.25_simyr1849-2106_c160923.nc

as found in clm.input_data_list

as found in datm.input_data_list

serbinsh commented 4 years ago

OK definitely seems to be running OK

Screen Shot 2020-09-04 at 12 46 57 PM

serbinsh commented 4 years ago

So the first case ran OK but i did get some warnings during archive, strangely. Something to look into later as the files were properly moved to the /history/lnd location

run command is mpirun -np 4 -npernode 4 /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/bld/cesm.exe  >> cesm.log.$LID 2>&1
check for resubmit
dout_s True
mach docker
resubmit_num 0
Submit job case.st_archive
Starting job script case.st_archive
st_archive starting
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/lnd.log.200904-163203.gz to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/logs/lnd.log.200904-163203.gz
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/cpl.log.200904-163203.gz to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/logs/cpl.log.200904-163203.gz
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/cesm.log.200904-163203.gz to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/logs/cesm.log.200904-163203.gz
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/atm.log.200904-163203.gz to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/logs/atm.log.200904-163203.gz
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/rof.log.200904-163203.gz to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/logs/rof.log.200904-163203.gz
-------------------------------------------
Archiving restarts for date date(1852, 1, 1, 0, 0, 0)
-------------------------------------------
Archiving restarts for datm (atm)
writing rpointer_file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/rpointer.atm
Archiving restarts for clm (lnd)
writing rpointer_file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/rpointer.lnd
moving file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.r.1852-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.r.1852-01-01-00000.nc
copying /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-01-01-00000.nc
 WARNING: ncdump -v locfnh /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1852-01-01-00000.nc  failed rc=1
    out=
    err=ncdump: locfnh: No such variable
moving file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1852-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1852-01-01-00000.nc
Archiving restarts for mosart (rof)
writing rpointer_file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/rpointer.rof
Archiving restarts for drv (cpl)
writing rpointer_file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/rpointer.drv
moving file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.cpl.r.1852-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1852-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.cpl.r.1852-01-01-00000.nc
Archiving restarts for dart (esp)
rpointer_content unset, not creating rpointer file rpointer.unset
-------------------------------------------
Archiving restarts for date date(1853, 1, 1, 0, 0, 0)
-------------------------------------------
Archiving restarts for datm (atm)
Archiving restarts for clm (lnd)
copying file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.r.1853-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1853-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.r.1853-01-01-00000.nc
Copying /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-12-31-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1853-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-12-31-00000.nc
 WARNING: ncdump -v locfnh /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1853-01-01-00000.nc  failed rc=1
    out=
    err=ncdump: locfnh: No such variable
copying file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1853-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1853-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.clm2.rh0.1853-01-01-00000.nc
Archiving restarts for mosart (rof)
Archiving restarts for drv (cpl)
copying file /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.cpl.r.1853-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/rest/1853-01-01-00000/CLM5-FATES_1599235597_PA-SLZ.cpl.r.1853-01-01-00000.nc
Archiving restarts for dart (esp)
Archiving history files for datm (atm)
Archiving history files for clm (lnd)
copying /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-12-31-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/lnd/hist/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-12-31-00000.nc
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/lnd/hist/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1852-01-01-00000.nc
moving /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/run/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1851-01-01-00000.nc to /ctsm_output/CLM5-FATES_1599235597_PA-SLZ/history/lnd/hist/CLM5-FATES_1599235597_PA-SLZ.clm2.h0.1851-01-01-00000.nc
Archiving history files for mosart (rof)
Archiving history files for drv (cpl)
Archiving history files for dart (esp)
st_archive completed
Submitted job case.run with id None
Submitted job case.st_archive with id None