HARPgroup / model_meteorology

0 stars 0 forks source link

Updated Scripts to Add Meteorology Data set #29

Open rburghol opened 2 years ago

rburghol commented 2 years ago

Document basic steps, with use examples of the entire workflow here. This includes new, single land segment or grouping (like sova, nova, ...) focused iterations, and is a condensed version of the more complete workflow give here: HARPgroup/HARParchive#62

All NLDAS2 scripts from download to WDM creation for a specific model met/prad scenario:

  1. Download Most Recent NLDAS2 Data: This github issue walks user through creating accounts and downloading data -- as of 4/13/2022, there is no script created that allows us to download more than 1 year of data at a time.
    • [x] get_nldas_to_date - iterate through and retrieve all data available (see model_meteorology/sh/get_nldas_to_date )
      • Use:
      • cd /backup/meteorology/
      • get_nldas_to_date YYYY [ending jday=today]
      • Ex: get_nldas_to_date 2022
      • calls get_nldas_data.bash (in model_meteorology/sh/get_nldas_data.bash )
      • Note: this is run in a cron script in deq2 /etc/cron.daily/deq-drought-model
    • [x] Uses wget syntax provided by NLDAS:
      • [x] : wget --load-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies -np -r -NP -R "*.xml" -c -N --content-disposition https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_H.002/[YEAR]/[JULIAN DAY]
      • [x] Ex: wget --load-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies -np -r -NP -R "*.xml" -c -N --content-disposition https://hydro1.gesdisc.eosdis.nasa.gov/data/NLDAS/NLDAS_FORA0125_H.002/2002/001/
        • Note: This will download the first day of 2002. To download the entire year, remove 001/
  2. Process NLDAS2 Data into CSV files for each grid cell in model domain
    • [x] p5_g2a.bash generate CSV files for all grid cells in all land segments for P5
      • [x] Use: bash p5_g2a.bash [start as YYYYMMDDHH] [end as YYYYMMDDHH] [input data dir] [output data dir]
      • [x] Ex: bash /backup/meteorology/p5_g2a.bash 2020010100 2020123123 /backup/meteorology /backup/meteorology/out/grid_met_csv
    • [x] g2a_one.bash generate CSV files for one grid cell
      • [x] Use: bash g2a_one.bash [start as YYYYMMDDHH] [end as YYYYMMDDHH] [input data dir] [output data dir] [grid formatted x###y###]
      • [x] Ex: bash /backup/meteorology/g2a_one.bash 2020010100 2020123123 /backup/meteorology /backup/meteorology/out/grid_met_csv x393y93
    • [x] .p5_g2a_all @alexwlowe - this streamlines p5_g2a.bash, using some logic to eliminate the duplication for first year, just a single loop that doesn't care about the time frame - it can handle it.
      • Use: p5_g2a_all [start as YYYYMMDDHH] [end as YYYYMMDDHH] [input data dir] [output data dir]
      • Ex: ./p5_g2a_all 19840101 20201231 /backup/meteorology /backup/meteorology/out/grid_met_csv
    • [x] todo:
      • [x] Goal: I want to add a single config file that has these coordinate pairs in them. @kylewlowe is this doable?
      • [x] @alexwlowe Also aren't p5 and p6 covering the same grid areas? If so, then we only need to run these once and then can generate model files for both models.
    • [x] grid2land.sh : generate CSV files for all grid cells in a given land segment @Alexvt15 this script shrinks down some of your code from p5_g2a.bash above, and also uses a technique of looking in "seg_maps" for coordinate pairs and just executing the routines for those. This is NOT efficient if we were to do every landseg like this as it would repeat overlapping cells, but for a single land segment it is much more efficient since it only calls NLDAS2_GRIB_to_ASCII once per year instead of multiple times per year.
      • Use:
      • Ex: grid2land.sh 1985010100 2020123123 /backup/meteorology /backup/meteorology/out/grid_met_csv A51031
  3. Create land seg PREC,... CSVs from grid for a land segment
    • [ ] Do a batch, like southern_a2l_timeframe.bash (see HARPgroup/HARParchive#156 )
      • [ ] Use: ``
      • [ ] Ex: ``
    • [x] Do a single a2l_one
      • [x] Use: a2l_one startYYYYMMDDHH endYYYYMMDDHH land_segment
      • [x] Ex: /backup/meteorology/a2l_one 2020010100 2020123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv A51035
      • @alexwlowe @kylewlowe I adapted this from your script. This was useful for me to learn the process that you developed, but also I think this may also be an efficient way to handle this, since it can save time when re-doing single segments (like if the weather data had a glitch and had to be rerun). It finds the segments for a single land seg from the seg_maps file, then creates a temporary file with only that segment in it, and then it processes it. This "find segments from seg_maps" technique could also be useful elsewhere.
    • [ ] QUESTION: Where are the "F" land segments from? Phase 5.3 did not have these? See HARPgroup/HARParchive#164
  4. Create LandSeg RNMax file with LongTermAvgRNMax
    • [x] Note: This file has maximum solar rad for each day of year to compute cloud cover). The RNMAx file for each land segment is expected to be in a directory above the "STARTDATE-ENDDATE" sub-folder of the other land segment met CSVs in the cbp tree. Thus, it is crucial that the RNMax file is ONLY updated using the largest timespan avaialble, since otherwise, this could create inconsistencies in cloud cover between WDMs of varying duration.
    • [x] Use: LongTermAvgRNMax landseg_csv_file_path rnmax_file_output_path num_segs lseg1 lseg2 lseg3...
    • [x] Ex: LongTermAvgRNMax /opt/model/p53/p532_alex/input/unformatted/nldas2/harp2021/1984010100-2020123123 /opt/model/p53/p532_alex/input/unformatted/nldas2/harp2021/RNMax 1 A51175
  5. Create WDMs: (these files all call wdm_insert_ALL)
    • [x] All at once (5&6): /backup/meteorology/wdm_generation_allLsegs.bash
    • [x] phase 5 only: wdm_generation_p5.bash
    • [x] phase 6 only: wdm_generation_p6.bash
    • [x] Single one: wdm_pm_one
      • Use: wdm_pm_one land_segment YYYYMMDDHH YYYYMMDDHH source version
      • Ex: wdm_pm_one A51031 1984010100 2020123123 nldas1221 harp2021
    • [x] wdm_insert_ALL Expects a directory of text files for each met parameter to live in input/unformatted/[data_source]/[version] - so wdm_pm_one copies the files from the met source into there (and creates those directories if they don't already exist)
    • @alexwlowe These are great starting (to run model now), here are some observations and some tips on the above scripts:
      • I did a quick check for a northern river land seg (51107) and found it! Good stuff.
      • Estimated time to completion for generating all phase 5 wdms (from existing CSVs) is 40-45 minutes. This is certainly tolerable if the other parts of the work flow (namely downloading) run quickly enough.
      • All 3 scripts hard code the date range in there, instead of taking as an argument - def need to fix that. These will be crucial scripts run weekly.
      • the "all" script should simply call the northern and southern one after the other.
      • Really, our land segment list should be in one single place. Then, any file that needs a list of all land segments, should refer to that single file. As of now, we likely have a ton of files with a list of land segments in them.
      • Output of these scripts should go to /backup/meteorology/out/lseg_wdm/ rather than the sub-directory of a code directory in p532_alex
      • The default output directory for these WDMs should be in /backup/meteorology/out/lseg_wdm/, rather than
  6. Deprecated: See wdm_pm_one above. Copy WDMs to a model scenario with make_met_scenario.sh
    • [x] create directory to house WDM files ./input/scenario/climate/met/[met scenario name]
    • [x] copy created wdm files to directory
  7. Run the CBP model
    • [x] p5
    • [x] p6
  8. Exporting Runoff & Meteorology Data from CBP to VAHydro
    • [x] Export WDM files to text readable by vahydro database https://github.com/HARPgroup/cbp_wsm/issues/59
    • wdm_flow_csv:
      • Set up export directories in VAHydro/OM and export runoff for All landsegs in a watershed
      • Use: wdm_flow_csv [scenario] [riverseg] [start year] [end year]
      • Ex: cbp wdm_flow_csv CFBASE30Y20180615_vadeq JL1_6770_6850 1984 2020
    • Single Land Segment
      • Use: Rscript $CBP_ROOT/run/export/wdm_export_flow.R [scenario] [landseg] [syear] [eyear] [CBP_EXPORT_DIR] [CBP_ROOT]
      • Ex: Rscript $CBP_ROOT/run/export/wdm_export_flow.R CFBASE30Y20180615_vadeq N51003 1984 2020 /media/model/p6 /opt/model/p6/gb604b
    • [x] Final output must be: filename="/media/model/p6/out/land/$scenario/eos/${landseg}_0111-0211-0411.csv"
    • [x] wdm_export_land_flow() exports separate files for each flow component (111,211,411) for each land use in a land segment.
      • Ex output files:
      • forN51003_0111.csv
      • forN51003_0211.csv
      • forN51003_0411.csv
        
        wdm_merge_land_flow

  - [x] Insure that VAHydro land segment knows the scenario
  - [x] pre-load land segment data (see https://github.com/HARPgroup/vahydro/issues/86)
    - [x] [create_landseg_table.sh](https://github.com/HARPgroup/om/blob/master/sh/create_landseg_table.sh)
      - Expects `filename="/media/model/p6/out/land/$scenario/eos/${landseg}_0111-0211-0411.csv"`
      - Use: create_landseg_table.sh [landseg] [scenario]
      - Ex: create_landseg_table.sh `$i CBASE1808L55CY55R45P50R45P50Y`
      - Note: this routein can be run from anywhere ion the system, and it will look for variables in hspf.config, and default to values in /etc/hspf.config to find the location of the data files to insert into the database table, and the database table template to use with that version of the model.  See 
    - [x] [create_all_landseg_runoffs-p6.sh](https://github.com/HARPgroup/om/blob/master/sh/create_all_landseg_runoffs-p6.sh)
    - [x] [create_all_landseg_runoffs-p5.sh](https://github.com/HARPgroup/om/blob/master/sh/create_all_landseg_runoffs-p5.sh)

### Met Scenario Creation Script for HSPF/CBP model
No longer relevant -- this has been supplanted by `wdm_pm_one`

- Located in `/opt/model/p53/p532_alex/bin/make_met_scenario.sh`
- Usage: `make_met_scenario.sh start end met_name prad_name nldas_dir model_dir`
- See: https://github.com/HARPgroup/cbp_wsm/blob/master/run/make_met_scenario.sh

#### Script Prototype

!/bin/bash

start_date=$1 end_date=$2 met_name=$3 prad_name=$4 nldas_dir=$5 model_dir=$6

code to run the WDM creation goes here

move the met WDMs

met_dir="$model_dir/input/scenario/climate/met/$met_name" mkdir $met_dir cp $nldas_dir/met*.wdm $met_dir/

move the prad WDMs

prad_dir="$model_dir/input/scenario/climate/prad/$prad_name" mkdir $prad_dir cp $nldas_dir/prad*.wdm $prad_dir/



#### How to use nohup command in Linux:

[note](https://github.com/HARPgroup/HARParchive/projects/2#card-78504305) on the _data models & code_ section of the project
rburghol commented 2 years ago

Testing:

/opt/model/p53/p532_alex/bin/make_met_scenario.sh 19840101 20211231 nldas2_20211221 prad_20211221 /opt/model/p53/p532_alex/code/src/prad_met /opt/model/p53/p532c-sova
rburghol commented 2 years ago

Test using new cbp exec framework. Successfully copies WDMs over.

cbp make_met_scenario.sh 19840101 20211231 nldas1121 p20211221 /opt/model/p53/p532_alex/code/src/prad_met /opt/model/p53/p532c-sova
rburghol commented 2 years ago

Testing with single missing segment:

# get all the data for this grid cell
grid2land.sh 19840101 20201231 /backup/meteorology /backup/meteorology/out/grid_met_csv A51031
# data was bad for 1984, cell x369y99 was empty, so reran
grid2land.sh 19840101 19841231 /backup/meteorology /backup/meteorology/out/grid_met_csv A51031
# bad data in 1986 - totally huge values for ET
grid2land.sh 19860101 19861231 /backup/meteorology /backup/meteorology/out/grid_met_csv A51031
# turn grid data into land segment CSV
a2l_one A51031
# since this is a full time period run, create a summary RNMax file
LongTermAvgRNMax /backup/meteorology/out/lseg_csv/1984010100-2020123123 /backup/meteorology/out/lseg_csv/RNMax 1 A51031

# create WDMs
# wdm_pm_one looks for an hspf.config file to find wdm file paths
wdm_pm_one A51031 1984010100 2020123123 nldas2 harp2021 nldas1221 p20211221

# copy WDMs into project

# run the river segment
cbp run_all.csh p532sova_cal OR2_7670_7840
rburghol commented 2 years ago

Testing with a full basin

# run all data conversion
# this is a single call that should cover all land segs in all rivers, and be done no more than once per month.
p5_g2a_all 19840101 20201231 /backup/meteorology /backup/meteorology/out/grid_met_csv
# this is good since we won't have to do it again for a few weeks or so, AND, we only should need to do 2021+
# since data in out/grid_met_csv/
# is stored by year, so only update a single year, the most recent one, ex:
#p5_g2a_all 20200101 20211231 /backup/meteorology /backup/meteorology/out/grid_met_csv

# get list of land segments needed
cd /opt/model/p53/p532c-sova/
segs=`cbp get_landsegs OR7_8490_0000`
for i in $segs; do
  # convert raw grid data into CSVs
  # no need to call grid2land.sh because we did ALL grids above
  # ./grid2land.sh 1984010100 2020123123 /backup/meteorology /backup/meteorology/out/grid_met_csv $i
  # convert grid CSVs into land segment CSVs
  a2l_one 1984010100 2020123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv $i
  # update long term averages
  LongTermAvgRNMax /backup/meteorology/out/lseg_csv/1984010100-2020123123 /backup/meteorology/out/lseg_csv/RNMax 1 $i
  # finally, create a WDM for each land seg
  # this script reads the file /etc/hspf.config to get directories.
  # later, we will create separate directories to call these scripts from, and each will have their own hspf.config file, allowing us to automatially put the fiels in the right place.  For now, the global /etc/hspf.config file is the fallback, and defaults to p532c-sova
  wdm_pm_one $i 1984010100 2020123123 nldas2 harp2021 nldas1221 p20211221
done

# Run them
cbp run_all.csh p532sova_2021 OR7_8490_0000
rburghol commented 2 years ago

Go up to date (@jdkleiner ):

rburghol commented 2 years ago

Update 6/2022