ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
308 stars 312 forks source link

Documentation need for generic single point with nuopc #1565

Open wwieder opened 2 years ago

wwieder commented 2 years ago

Here are the steps needed to create and run a single point case with nupoc.
Users will have to crease a surface dataset first and then make a few manual changes (mainly to shell_commands).

NOTE the mods to user_nl_mosart seem like an (unnecessary?) gotcha

Erik's example case is here:

/glade/work/erik/ctsm_worktrees/branch1/cime/scripts/cases/RandomBoreal_0

With create_newcase command:

./create_newcase --case cases/RandomBoreal_0 --compset I2000Clm51Bgc --res CLM_USRDAT --driver nuopc --user-mods-dirs newton_krylov_spinup --mpilib mpi-serial --run-unsupported (note I used the mpi-serial library for MPI because it's just a single point case)

I did these xml commands (added them to shell_commands)

./xmlchange CLM_USRDAT_NAME="BOREAL1"
./xmlchange PTS_LON=55.
./xmlchange PTS_LAT=57.958115183246
./xmlchange CASESTR="Random Boreal"

NOTE: CASESTR is not required, but makes sure a few things are set with something besides UNSET

Added this to user_nl_clm

fsurdat = '/glade/scratch/wwieder/single_point/surfdata_hist_16pfts_Irrig_CMIP6_simyr2000_RandomBoreal_c210823.nc'

and this to user_nl_mosart

frivinp_rtm = '/dev/null'

(This is something that is required with latest updates in externals, when MOSART_MODE=null (i.e. when compset has MOSART rather than SROF). This is a little glitch that we should fix in CTSM so you don't have to do this (There's a MOSART issue about this)

Originally posted by @ekluzek in https://github.com/ESCOMP/CTSM/issues/1457#issuecomment-905259041

Definition of Done:

adrifoster commented 2 years ago

Thanks @wwieder! I am working on adding the relevant commands/file updates for using subset climate data with NUOPC.

@ekluzek the user_nl_datm_streams files gives good notes for how to update things like files, etc. But it also mentions the meshfile. If I remember correctly we do not need mesh files for single-point runs? Is this correct for the datm data as well? If so, should we update this meshfile value to be blank?

ekluzek commented 2 years ago

OK, I wasn't sure about this myself. But, I looked into how it's done for the NEON data and we should be able to copy how it does it.

The header for the NEON data looks likes this...

  <stream_info name="NEON.NIWO">
   <taxmode>cycle</taxmode>
   <tintalgo>linear</tintalgo>
   <readmode>single</readmode>
   <mapalgo>none</mapalgo>
   <dtlimit>1.5</dtlimit>
   <year_first>2018</year_first>
   <year_last>2018</year_last>
   <year_align>2018</year_align>
   <vectors>null</vectors>
   <meshfile>none</meshfile>
   <lev_dimname>null</lev_dimname>

So mapalgo is none, meshfile is none, and there's only one point of data. If all of those are true we shouldn't need a meshfile for the datm part of this. For regional cases with more than one datapoint for datm we would need to provide a meshfile.

ekluzek commented 2 years ago

So @adrifoster yes you should set meshfile to something besides blank, you should set it to "none" and also set mapalgo to "none".

wwieder commented 2 years ago

for what it's worth, setting up this single point run was super simple after creating the surface dataset, but it's just using the global datm data

/glade/work/wwieder/ctsm/ctsm5.1_N-K_test/cime/scripts/RandomBoreal_0c

adrifoster commented 2 years ago

Great, thanks! @ekluzek

Definitely, @wwieder I think it will be easy going once we get the user_nl_datm_streams files set up for the datm data too.

adrifoster commented 2 years ago

@ekluzek I tried setting mapalgo=none in the user_nl_datm_streams file but I got this error:

Create namelist for component datm
   Calling /project/tss/afoster/ctsm_fates/components/cdeps/datm/cime_config/buildnml
ERROR: mapalgo can only have values of ['bilinear', 'nn', 'redist', 'mapconsd', 'mapconf'] for stream CLMGSWP3v1.Solar in file /project/tss/afoster/FATES_cases/BONA_080_719baba6_38495279/user_nl_datm_streams
ekluzek commented 2 years ago

@adrifoster OK thanks for pointing that out. This sounds like a problem in CDEPS. Let me look into this a bit, I can probably give you a workaround for it. But, we'll likely need to get a change in CDEPS to fix this. Since it allows none for NEON it obviously can work this way.

ekluzek commented 2 years ago

@adrifoster I realized there are two cases here. One for when you are pointing to a site with tower data, and one when you are pointing to forcing data with a single point extracted out. In the first case you should set DATM_MDOE=1PT which should get you everything you want other than filenames. In the later case you want to explicitly change mapalgo=none, and meshfile to none, but leave the others alone.

adrifoster commented 2 years ago

Here is the script I have been using to create single-point cases - a lot of this can go in the user_mods, but perhaps some of it is redundant or out of date with NUOPC?

#!/usr/bin/env bash

#
# #########################
# Purpose: Create and build a single-point case for CTSM runs
# Author: Adrianna C. Foster
# Date: September, 2021
# bash version 4.2.46
# #########################
# #########################
# Input format: text file
# #########################
# #########################
# Notes: Makes use of the config_parse script. If you want to add more
#        parameters, see the notes for that script.
#        This also assumes you have already subset the correct surface, domain,
#        and datm files. See subset_data.py script.

## Parameters ------------------------------------------------------------------
# CONF              - config file name             (argument for script)
# MACH              - machine (izumi/cheyenne)     (in config file)
# PROJECT           - project code                 (in config file)
# SRCDIR            - CTSM source code directory   (in config file)
# TAG               - name of case                 (in config file)
# SITE              - site name                    (in config file)
# FATES             - use fates or not             (in config file)
# PARAM_FILE        - FATES parameter file         (in config file)
# PARAM_DIR         - FATES param file location    (in config file)
# CASE_DIR          - case directory path          (in config file)
# COMP              - compset for run              (in config file)
# RES               - resolution of run            (in config file)
# STOPN             - how long to run              (in config file)
# RESUBN            - times to resubmit run        (in config file)
# STOPVAL           - units for STOPN              (in config file)
# RESTVAL           - units for RESUBN             (in config file)
# STATDATE          - start date for run           (in config file)
# DATMSTARTYR       - start year to loop datm over (in config file)
# DATMSTOPYR        - end year to loop datm over   (in config file)
# DATMMODE          - mode for data atm. component (in config file)
# CLM_USRDAT_DOMAIN - subset domain file           (in config file)
# CLM_DOMAIN_DIR    - subset domain location       (in config file)
# CLM_SURFDAT_DIR   - subset surface data location (in config file)
# CLM_USRDAT_SURDAT - subset surface data file     (in config file)
# USER_DATM_DIR     - location of datm.streams*    (in config file)
# WALL_TIME         - wallclock time               (in config file)

## Config file name
if [ $# -eq 0 ]
  then
    echo "Enter config file name"
    read CONF
  else
    CONF=$1
fi

## Parse the config file to get parameters
source config_parse ${CONF}

## Get CTSM git version - this will go into case name
cd ${SRCDIR}
githashctsm=`git log -n 1 --format=%h`

## Get the FATES git version if running fates - this will also go into case name
if [[ "$FATES" == "1" ]]
then
  cd src/fates
  githashfates=`git log -n 1 --format=%h`

  ## Create case name with ctsm and fates githash
  case_name=${CASE_DIR}/${TAG}_${githashctsm}_${githashfates}
else
  ## Create case name with just ctsm githash
  case_name=${CASE_DIR}/${TAG}_${githashctsm}
fi

## Define CIME directory
base_dir=${SRCDIR}/cime/scripts

## Create the case
cd ${base_dir}
./create_newcase --case ${case_name} --res ${RES} --compset ${COMP} --project ${PROJECT} --run-unsupported --mach ${MACH}

cd ${case_name}

## Modify env_mach_pes file
## Increase to 8 nodes
./xmlchange NTASKS_ATM=1
./xmlchange NTASKS_LND=1
./xmlchange NTASKS_ROF=1
./xmlchange NTASKS_ICE=1
./xmlchange NTASKS_OCN=1
./xmlchange NTASKS_CPL=1
./xmlchange NTASKS_GLC=1
./xmlchange NTASKS_WAV=1
./xmlchange NTASKS_ESP=1

./xmlchange NTHRDS_ATM=1
./xmlchange NTHRDS_LND=1
./xmlchange NTHRDS_ROF=1
./xmlchange NTHRDS_ICE=1
./xmlchange NTHRDS_OCN=1
./xmlchange NTHRDS_CPL=1
./xmlchange NTHRDS_GLC=1
./xmlchange NTHRDS_WAV=1
./xmlchange NTHRDS_ESP=1

./xmlchange ROOTPE_ATM=0
./xmlchange ROOTPE_LND=0
./xmlchange ROOTPE_ROF=0
./xmlchange ROOTPE_ICE=0
./xmlchange ROOTPE_OCN=0
./xmlchange ROOTPE_CPL=0
./xmlchange ROOTPE_GLC=0
./xmlchange ROOTPE_WAV=0
./xmlchange ROOTPE_ESP=0

## Must do this for single-point runs
./xmlchange MPILIB=mpi-serial

## Run case-setup
./case.setup

## Modify env_run parameters - can add more parameters using the config_parse
## script
## Note for variables that require quotes you must use the double then single
## quotes approach for it to work.
./xmlchange --id STOP_N --val ${STOPN}
./xmlchange --id RUN_STARTDATE --val $STARTDATE
./xmlchange --id STOP_OPTION --val ${STOPVAL}
./xmlchange --id REST_OPTION --val ${RESTVAL}
./xmlchange --id RESUBMIT --val ${RESUBN}
./xmlchange --id CLM_FORCE_COLDSTART --val on
./xmlchange --id JOB_WALLCLOCK_TIME --val ${WALL_TIME}
./xmlchange --id DATM_YR_ALIGN --val ${DATMSTARTYR}
./xmlchange --id DATM_YR_START --val ${DATMSTARTYR}
./xmlchange --id DATM_YR_END --val ${DATMSTOPYR}
./xmlchange --id DATM_MODE --val ${DATMMODE}
./xmlchange PTS_LON=${LON}
./xmlchange PTS_LAT=${LAT}
./xmlchange --id CLM_USRDAT_NAME --val ${SITE}

## Need jobqueue for running on izumi
if [[ "$MACH" == "izumi" ]]
then
  ./xmlchange --id JOB_QUEUE --val verylong
fi

## Update the user_nl_clm file
## the fsurdat is used for updating the surface dataset file for
## singlepoint runs

if [[ "$FATES" == "1" ]]
then

cat > user_nl_clm <<EOF
fsurdat = '${CLM_SURFDAT_DIR}/${CLM_USRDAT_SURDAT}'
use_fates=.true.
fates_parteh_mode=1
fates_spitfire_mode=0
EOF

else
cat > user_nl_clm <<EOF
fsurdat = '${CLM_SURFDAT_DIR}/${CLM_USRDAT_SURDAT}'
EOF

fi

## Build and submit the case
./case.build
./case.submit
ekluzek commented 2 years ago

I created CDEPS issues for this discussion:

https://github.com/ESCOMP/CDEPS/issues/134 https://github.com/ESCOMP/CDEPS/issues/135

ekluzek commented 2 years ago

@adrifoster in looking at the above script there are some things that I think are important here. https://github.com/ESCOMP/CTSM/issues/1565#issuecomment-984026717

For it to be a supported script it should be in python and based off of the run_neon script. And the two scripts should possibly share some infrastructure.

You are setting some things I don't think you need to, such as PE settings. And some things are set by the compset, so possibly the choice of compset should be how they are set rather than by overridden explicitly. Also by the way to set NTASKS the same for all you can do...

./xmlchange NTASKS=1

and that will set all components to 1 task.

I'd also set the wallclock limit by using the argument to create_newcase rather than through xmlchange. Because it's cleaner and because it won't also change the time for the st_archive script.

adrifoster commented 2 years ago

Hey Erik,

This is just my own personal script based on scripts that others have sent me so I was not planning on having this be in the subset_data script, obviously many things should be set by the individual user.

But good to know on the PE settings - I think you said to set these as a precaution but I will remove them in the future.

On Wed, Dec 8, 2021 at 9:51 AM Erik Kluzek @.***> wrote:

@adrifoster https://github.com/adrifoster in looking at the above script there are some things that I think are important here.

1565 (comment)

https://github.com/ESCOMP/CTSM/issues/1565#issuecomment-984026717

For it to be a supported script it should be in python and based off of the run_neon script. And the two scripts should possibly share some infrastructure.

You are setting some things I don't think you need to, such as PE settings. And some things are set by the compset, so possibly the choice of compset should be how they are set rather than by overridden explicitly. Also by the way to set NTASKS the same for all you can do...

./xmlchange NTASKS=1

and that will set all components to 1 task.

I'd also set the wallclock limit by using the argument to create_newcase rather than through xmlchange. Because it's cleaner and because it won't also change the time for the st_archive script.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/CTSM/issues/1565#issuecomment-988988445, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADE42ISWAP6W73I7VAZKUUDUP6EHPANCNFSM5JBZLF7A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ekluzek commented 2 years ago

Here's the discussion on single point simulations that's @adrifoster put together.

https://github.com/ESCOMP/CTSM/discussions/1833

samsrabin commented 1 month ago

From @ekluzek:

We need new instructions for the new methods involving subset_data. There are some hints in the PTCLM chapter about topics that should be put into the new subset_data section. In terms of things like how do you do this for US-UMB? As well as for some generic tower site you have data for? Instructions for making NEON and PLUMBER2 datasets should also be in the User's Guide. And instructions and examples on the modify surfdata tools would also be good to have as well.

For future reference, the "PTCLM chapter" referred to there can be restored by checking out a version of CTSM before the merge of whatever PR ends up resolving #2769. E.g., ctsm5.2.028.