NWTlter / NWT_CLM

Workflow scripts for running point simulations of CLM at Niwot Ridge using Tvan forcing data
1 stars 7 forks source link

NWT_CLM

This repoistory contains scripts that are necessary for running and analyzing data from CLM point simulations at Niwot Ridge, using Tvan Forcing data.

Niwot scripts workflow:

  1. Clean L1 data from tvan data using clean_tvan_data.R*
  2. Gap fill and generate .nc forcings with prepare_forcings_for_clm.R
    • also requires nwt_QRUNOFF.ipynb to supplement wet meadow precipitation
  3. Run the model at Niwot Ridge by following the instructions in CLM_instructions.md
  4. Download and format observations for comparison with the model using prepare_obs_for_comparison.R
  5. Format model output for comparisons with observations using prepare_sim_for_comparison.R
  6. Create comparison plots between simulation and observations with plot_obs_sim_comparisons.R
  7. Additional comparison plots between simulation and observations with plotFluxes_TVan_CLM.ipynb & plot_NWTcommunities_CLM.ipynb

*This script will be rendered obsolete once the Tvan data is available on AmeriFlux

**Figure 1**: NWT_CLM workflow overview

<<<<<<< HEAD

Quick start on installing this repo

=======

Quick start on installing this repo [most relevent for using python code]

734064f782185edbc4ba93f57db4bbf648d0ed97 First fork this repository:

Then clone your fork:

git clone https://github.com/$USER/NWT_CLM.git

Creating a python environment

Create the environment with:

cd NWT_CLM
conda env create -f environment.yml

Then you should hopefully have ctsm-py available as an environment in JupyterHub

Then intall the utilities:

conda activate ctsm_py
pip install -e .

How to run each script

Clean L1 data

1. clean_tvan_data.R

This script cleans up Tvan L1 data that has been produced with the Niwot LTER tvan_L1_preprocess.R script from the Niwot LTER's repository. It is a temporary script meant to add a few extra cleaning steps to the L1 tvan data, until that data can be uploaded to AmeriFlux. The script reads in ReddyProc-ready data output by the tvan_L1_preprocess.R script, filters several problem spots, plots yearly comparisons between the filtered and unfiltered data, downloads Saddle Met data from EDI to fill in the gaps in air temperature after 2016, and writes out the data to two files called tvan_[tower]_[start_timestamp]_to_[end_endtimestamp]_flux_P_reddyproc_cleaned.txt

Inputs

  1. ReddyProc-ready files from tvan_L1_preprocess.R; This script expects those files to have the following variables

    • NEE - Net ecosystem exchange (umolm-2s-1)
    • LE - Latent heat flux (Wm-2)
    • H - Sensible heat flux (Wm-2)
    • Ustar - friction velocity (ms-1)
    • Tair - Air temperature (degC)
    • VPD - Vapor pressure density (kPa)
    • rH - relative humidity (unitless fraction)
    • U - Wind speed (ms-1)
    • P - Atmospheric Pressure (kPa)
    • Tsoil - Soil temperature (degC)
    • Year - Year of measurement (MST)
    • DoY - Day of year of measurement (MST)
    • Hour - decimal hour of measurement (MST)
  2. Air temperatures taken from Saddle Met data from EDI. This is automatically downloaded from EDI. Since it is only meant to fill in the gap in air temperature at 2016, only gaps from 2016+ are filled with Saddle air temperature data.

User Options

Outputs

The script will create a directory called supp_filtering in the DirOutBase location and save the filtered data to that directory. A directory to hold the Saddle Met data will be created within this directory and if makeplots = TRUE it will also create a directory called plots within the supp_filtering directory to hold the yearly plots of each variable.

File structure:

<DirOutBase>
└── supp_filtering
    ├── Plots
    |   └── [variable]_yearly_plots
    ├── tvan_[tower]_[start_timestamp]_to_[end_endtimestamp]_flux_P_reddyproc_supproc.txt
    └── saddle_met_data

Script Status

This script is currently necessary to further clean the tvan forcings in preparation for prepare_forcings_for_clm.R in addition to gap-filling one tower with the other. After a finalized version of the tvan data is up on Ameriflux, this script may no longer be necessary or may be used only for combining the two tower's data together to fill any gaps.

The proposed modifications for this script once the Tvan data are on Ameriflux are:

  1. Copy the download_amflx() function and Handle Radiation data section from prepare_forcings_for_clm.R to this script

  2. Modify the copied as needed to automatically download and read-in the NR-2/NR-3 data (tvan east and west towers).

  3. Verify that the rest of the script as written works with the new format of tvan data.

top

Generate atmospheric forcings for CLM

2. prepare_forcings_for_clm.R

The prepare_forcings_for_clm.R script generates atmospheric forcings for CLM from Niwot Ridge. It assembles the forcings with observational data from three sources. Daily precipitation data from the saddle that has been distributed into half-hourly data according to the method laid out in Wieder et al. 2017, half-hourly radiation data from the NR1 AmeriFlux tower, the rest of the forcings from the Tvan towers at Niwot.

NOTE: For this script to work, the user must have an AmeriFlux username and account.

Inputs

  1. Tvan data from the Niwot Ridge Tvan towers, either tower can be used, or both. If both are used, then one tower will be used to gap-fill the other. For the variables that are fed into the model, the two towers have good congruence. Right now, the user specifies the location the data generated by supplemental_cleaning.R, but eventually, the data will be on Ameriflux and the download_amflx() function can be used to download this data.

  2. Saddle daily precipitation data, these are automatically downloaded from EDI, as are C1 precipitation data from USCRN. The Saddle precip data has a blowing snow correction applied to months of Oct-May, see William et al. (1988). The C1 data are used to distribute the daily precipitation from the Saddle proportionally into 30-minute timesteps.

  3. AmeriFlux NR1 tower radiation data, these are automatically downloaded and used to provide short and long-wave radiation data for the forcings.

User Options

Options currently under development:

Outputs

The data folder

The plots folder

Directory structure:

<DirOutBase>
└── <data_version>
    ├── data
    │   ├── tvan_forcing_data_both_towers_2007-05-11_2020-08-11.txt
    │   ├── tvan_forcing_data_flagged_both_towers_2007-05-11_2020-08-11.txt
    │   ├── tvan_forcing_data_precip_mods_both_towers_2007-05-11_2020-08-11.txt
    │   ├── dry_meadow
    │   │   ├── 2007-05.nc
    |   |   ....
    │   │   └── 2020-08.nc
    │   ├── fell_field
    │   │   ├── 2007-05.nc
    |   |   ....
    │   │   └── 2020-08.nc
    │   ├── moist_meadow
    │   │   ├── 2007-05.nc
    |   |   ....
    │   │   └── 2020-08.nc
    │   ├── original
    │   │   ├── 2007-05.nc
    |   |   ....
    │   │   └── 2020-08.nc
    │   ├── snow_bed
    │   │   ├── 2007-05.nc
    |   |   ....
    │   │   └── 2020-08.nc
    │   └── wet_meadow
    │       ├── 2007-05.nc
    |       ....
    │       └── 2020-08.nc
    └── plots
        ├── 2007-05-11_2020-08-10_required_forcing_postgapfilling.png
        ├── 2007_yearly_gap_plots_postgapfilling.png
        ....
        ├── 2020_yearly_gap_plots_postgapfilling.png
        ├── all_years_gap_plots.png
        ├── yearly_gap_plots_2007.png
        ....
        └── yearly_gap_plots_2020.png

Script Status

This script is mostly done, but could be improved by including an option to automatically create simulated run-off for the moist meadow community. The option is currently written into the user parameters as a place-holder, but is not implemented. The script expects the run-off data to be formatted as follows:

With this option the user would provide a data file from a simulated Moist Meadow run that contains two columns, a timestamp column (every timestamp represents the state at the end of the 30 minute sampling period) called "timestamp", and a column containing the runoff amounts in mm/s from a Moist Meadow simulation. If provided, this data will be added to the Wet meadow precipitation. If not provided, wet meadow precipitation will be 75% of observed precipitation without any added runnoff.

Steps to implement:

  1. Proposed but untested code for this option exists in the script at lines 1727-1733 in the Prepare 4 different precipitation regimes for the different vegetation communities section. It needs to be tested and verified to be working.

  2. Currently there is no code to load in the simulated runoff file, that needs to be written. Ideally this data would be loaded in at the beginning of the Prepare 4 different precipitation regimes for the different vegetation communities section.

  3. Another possible improvement would be the creation of a function to extract the simulated runoff from a netcdf file. Currently the code as written expects a data-frame and the production of that dataframe is left to the user.

top

Run the model

3. CLM_instructions

Now it's time to run CLM... see After running the model we'll compare with observations. First:

top

Download and format observations

4. prepare_obs_for_comparison.R

Workflow for collating NIWOT LTER data in preparation to compare observations to simulated data. The purpose of this script is to read in observational data from Niwot, and summarize it by three time-levels: Diurnal by season, daily (day of year), and annually.

Inputs

User Options

Outputs

This script outputs three files, the Diurnal-seasonal, daily, and yearly summaries of the observations.

Script Status

This script could be improved by expanding it to include observations of saddle grid productivity data and creating as summary of annual GPP/NPP/ANPP observations (as was done in Wieder et al. 2017 fig 5). There is some code for this in the Handle Saddle Grid Productivity data section, but it is not currently used by the downstream plotting script, and the units are likely incorrect. Determining how best to separate the available data into GPP/NPP/ANPP is also another challenge.

There also appear to be some bugs with the Soil Moisture and Temperature readings. In particular the fell field soil moisture and temperature are likely unreliable (they come from the Tvan measurements).

Other observations that could be added but are not currently in the code at all: Growing season length, soil moisture stress, delta N limitation, biomass. Determining what long term data to use for these observations, downloading it, and converting it into a useable form, will be the next step to bring this script up to scratch.

In general, there is a lot of future work to be done on this script before we can recreate all of the plots in Wieder et al. 2017 and beyond.

top

Format model output

5. prepare_sim_for_comparison.R

The goal of this script is to read in netcdf output data from a CLM model simulation and convert it into tab-delimited files that can be compared to observational data. The data produced is half-hourly but is also summarized at three levels: Diurnal by season, daily (day of year), and annually. Depending on the vegetation community a different level of soil will be used for the upper layer of soil moisture and soil temperature data. This is because Tvan soil data is collected at 10cm while the saddle sensor network soil data is collected at 5cm.

Inputs

User Options

Outputs

A folder in the base output directory named after the case_name with:

  1. Diurnal-seasonal data: all chosen variables averaged by hour of the day across years by season
  2. Daily Data: Daily (day-of-year) means and standard deviations of all chosen variables.
  3. Annual data: Annual means and standard deviations of all variables
  4. Unsummarized data: all chosen variables at all timestamps during the simulation
  5. Unit definitions: Units for each of the variables that are written out.
  6. Mean_annaual_summaries_vars: mean annual summaries of several variables ("GPP", "NPP", "ET", "TOTVEGC") specified in the static workflow parameters. NOTE: if ET or TOTVEGC are not specified by the user in usr_var they will not be summarized since they are not included in the default variable list.
  7. Max_elai_summary: a summary by vegetation community of the max elai per year and averaged over all years.

Note: For outputs 1-4, data from all available vegetation communities are concatenated and saved to a single file.

Script Status

This script is listed as "Done".

top

Comparing model and obs

6. plot_obs_sim_comparisons.R

This is a script that creates plots comparing observation and simulation data after the style of Wieder et al. 2017

Input

User Options

Outputs

3 plots comparing the fluxes, soil moisture data, and snow-depth data from the simulation to observations.

Script Status

This script is listed as "can be improved". It can primarily be improved by re-creating figures 5-7 of Wieder et al. 2017 (however most of these plots necessitate changes to prepare_obs_for_comparison.R):

In addition to the plots mentioned above, only some of the plots created will compare across all vegetation community types. Ideally, the script would be modified so that each plot could be like the snow depth plot, where all communities are compared to each other.

Finally, there are a number of plotting aesthetics that could be improved.

top

Comparing model and obs II

7a. plotFluxes_TVan_CLM.ipynb

Also makes diel, seasonal, and annual plots of simulated and observed fluxes from Tvan

7b. plot_NWTcommunities_CLM.ipynb

Plots simulations from 5 saddle communities and compares to observations from Saddle grid and Saddle Sensor Network.

Both scripts have the advantage of reading model output directly on /glade/scratch and are similar to scripts being developed for tower simulations at NEON sites

Dependencies: utils.py has some additional utilities that are sparingly used

top