NGEET / fates-containers

Repository for containerized version of fates for use in future tutorials
6 stars 7 forks source link

Initial single-site met forcing example #32

Closed serbinsh closed 3 years ago

serbinsh commented 3 years ago

Description:

An example command-line case script with customizable options. This would be an interim script where most inputs are provided for a single site/pixel. Expected to follow up with a script that provides all inputs cropped to the same lat/long

Details

docker run -t -i --hostname=docker --user $(id -u):$(id -g) -v ~/scratch/ctsm_fates:/output \
-v /Users/sserbin/Data/cesm_input_datasets:/inputdata -v ~/Data/GitHub/docker-fates-tutorial/scripts:/scripts/ \
ngeetropics/fates-ctsm-gcc650:latest /scripts/create_ctsm-fates_single-site_case.sh --site_name=PA-SLZ \
--compset=I2000Clm50FatesGs --start_year='2010-01-01' —num_years=2 --run_type=startup --met_start=2010 \
--met_end=2012 --resolution=0.9x1.25 --output_vars=output_vars.txt --output_freq=H --descname=siterun --debug=FALSE

Collaborators:

Docker image component versions

Base OS: gcc655 Host land model: CTSM FATES tag: release-clm5.0.30-143-gabcd593; Cabcd593-F3248e63

Checklist:

Test results

Build test on personal repo: Test case results:

serbinsh commented 3 years ago

Associated dataset on OSF. This dataset can be retrieved using the OSF Python API, like so:

10-4-3-157:~ sserbin$ osf -p kv93n fetch single_site_forcing/GSWP3_PA-SLZ_met_forcing_partial.tar.gz 100%|█████

Then extract

serbinsh commented 3 years ago

This is a very rough outline for how I have been envisioning a basic workflow with Docker site runs, etc. This is based on previous experimentation before migrating here. This also doesnt really touch yet on Jupyter Notebook but easy to move in that direction too.

Also, so far this example does not yet provide the met forcing data AND all other associated compset files. My python script to extract all files to a single point got corrupted but it worked so I just need to regenerate, but theoretically this will be easy,.

But for now what we can show is how to pull down the met data from OSF, build the custom case, and submit. Unfortunately right now, as I said, this will require downloading other files for the case but that happens automagically using the same backend provided via CESM

So the basic steps are (to be expanded on in the wiki):

1) git clone git@github.com:NGEET/docker-fates-tutorial.git

2) cd docker-fates-tutorial/scripts

3) install Python osfclient to use the OSF API

pip3 install osfclient

4) Grab met forcing data from OSF for PA-SLZ. First edit download_singlesite_forcing_data.sh to reflect where you want to create the main CESM data director and sub-directory containing the PA-SLZ forcing, surface, and domain files. Edit line 17, bash variable ${cesm_data_dir} then run

./download_singlesite_forcing_data.sh

5) Build a test case where you point Docker to the host location of the CESM datasets, e.g.:

docker run -t -i --hostname=docker --user $(id -u):$(id -g) -v ~/scratch/ctsm_fates:/output \
-v /Users/sserbin/Data/cesm_input_datasets:/inputdata -v ~/Data/GitHub/docker-fates-tutorial/scripts:/scripts/ \
ngeetropics/fates-ctsm-gcc650:latest /scripts/create_ctsm-fates_single-site_case.sh --site_name=PA-SLZ \
--compset=I2000Clm50FatesGs --start_year='2000-01-01' --num_years=10 --run_type=startup --met_start=2000 \
--met_end=2010 --resolution=0.9x1.25 --output_vars=output_vars.txt --output_freq=H --descname=siterun --debug=FALSE
Building cesm with output to /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17/bld/cesm.bldlog.201104-201246
Time spent not building: 2.209011 sec
Time spent building: 386.824052 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
*** Finished building new case in CASE: /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17 ***

*****************************************************************************************************
If you built this case interactively then:
To submit the case change directory to /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17 and run ./case.submit

If you built this case non-interactively then change your Docker run command to:

docker run -t -i --hostname=docker --user $(id -u):$(id -g) --volume /path/to/host/inputs:/inputdata \
--volume /path/to/host/outputs:/output docker_image_tag /bin/sh -c 'cd /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17 && ./case.submit'

Where:
/path/to/host/inputs is your host input path, such as /Volumes/data/Model_Data/cesm_input_datasets
/path/to/host/outputs is your host output path, such as ~/scratch/ctsm_fates
/path/to/host/outputs is the docker image tag on your host machine, e.g. ngeetropics/fates-ctsm-gcc650:latest

Alternatively, you can use environmental variables to define the constants, e.g.:
export input_data=/Volumes/data/Model_Data/cesm_input_datasets
export output_dir=~/scratch/ctsm_fates
export docker_tag=ngeetropics/fates-ctsm-gcc650:latest

And run the case using:
docker run -t -i --hostname=docker --user $(id -u):$(id -g) --volume ${input_data}:/inputdata \
--volume ${output_dir}:/output ${docker_tag} /bin/sh -c 'cd /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17 && ./case.submit'
*****************************************************************************************************

6) Run the case, e.g.

docker run -t -i --hostname=docker --user $(id -u):$(id -g) -v ~/scratch/ctsm_fates:/output \
-v /Users/sserbin/Data/cesm_input_datasets:/inputdata ngeetropics/fates-ctsm-gcc650:latest \
/bin/sh -c 'cd /output/PA-SLZ.siterun.fates.docker.I2000Clm50FatesGs.Cabcd593-F3248e63.2020-11-04_20-12-17 && ./case.submit'
glemieux commented 3 years ago

Also, so far this example does not yet provide the met forcing data AND all other associated compset files. My python script to extract all files to a single point got corrupted but it worked so I just need to regenerate, but theoretically this will be easy,.

I'm assuming that your script will be hosted on this repo eventually, yes? As an aside, I was wondering if with on going single-point discussion if single-point scripts and data handling should live elsewhere at some point.

But for now what we can show is how to pull down the met data from OSF, build the custom case, and submit. Unfortunately right now, as I said, this will require downloading other files for the case but that happens automagically using the same backend provided via CESM

Should we build OSF into the next version of the hlm-baseos? Alternative, maybe OSF has a containerized version of their app that we could leverage in the future (similar to the containerized version of jupyter)? My question here is motivated by the idea that we should reduce the amount of management of supporting apps (like osf) by baking it into the baseos image itself (or eventually have some sort of docker compose that runs an 'official' osf image if one exists).

serbinsh commented 3 years ago

@glemieux one thing - the osf client app is python. IMHO I think we could/should built a toolkit docker with Python, Jupyter, sample plotting scripts, osfclient, etc etc As you say another layer....what do you think?

I was thinking that like the example plotting you provide for the FATES tutorial we could provide some basic plotting functions in the tools docker that a user can run to grab met files, or to plot output from a container run etc.....

glemieux commented 3 years ago

@glemieux one thing - the osf client app is python. IMHO I think we could/should built a toolkit docker with Python, Jupyter, sample plotting scripts, osfclient, etc etc As you say another layer....what do you think?

Yes, I agree. I think we can create another PR to add another docker image that layers on top of the hlm-fates image with all these things incorporated. I had thought of putting osf into the baseos image, but you're right, putting this as a layer on top is better since its not necessary to running fates if you already have access to the data.

I think I could whip this together in the next couple of days unless you're already working on it @serbinsh.

glemieux commented 3 years ago

Test case on lobata ran through 10 years successfully.

As an aside: In setting up the case initially, I mistakenly setup the single_site data folder in a location that was not inside the cesm_input_data directory (i.e. the location of the other component data). When I started the run, it (correctly) began downloading all the other component data. It had gotten through about 2Gb worth of data before I realized what I had done; a good example of user error :P.

serbinsh commented 3 years ago

@glemieux Yeah, I am not super happy about the current requirement that we place the "single_site" folder where you have your existing CESM data. Obviously you can do this by mapping into docker your full CESM data dir, and for many cases this would be OK but perhaps we could/should make this a variable that can be changed? Perhaps again by a cmd flag?

Oh wait, I think now I recall why...its because we would need a directory inside of the docker container to map the single_site folder. We could add a second data dir in the container for these types of runs to use so we can separate the two.