Open rburghol opened 3 years ago
Big Questions Answers
NLDAS_GRIB_to_ASCII
). We are extracting 2015-present for every grid, and 2005-2014 for just the grids for the southern rivers
NLDAS_ASCII_to_LSEGS
Hey Rob,
Excellent - thanks for the update! Glad to see I misinterpreted the precip status!
http://deq1.bse.vt.edu:81/met/ - web address for /backup/meteorology directory
Various docs/resources that we have used for QA
Helpful Links for Null Values [here ](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt open-data-docs/docs/noaa/noaa-ghcn at main · awslabs/open-data-docs (github.com))
7/15/2021 Update
When batch running the land segments we discovered that we had missed some grids when using the NLDAS2_GRIB_to_ASCII
function. The grids that we were missing are grids that are not inside VA's minor basins, however they are inside land segments that are partially in VA's minor basins.
All of the missing grids are currently being extracted from 1984-2020 (I used the handy nohup trick Rob showed us yesterday so it should run all night even when I get signed out of deq4). I will probably login for a couple of min tomorrow when the grids are finished downloading and batch run the NLDAS2_ASCII_to_LSegs
so that on Monday we should be able to begin implementing our PET calculations with all of our data!
UPDATE ON MISSING VALUES IN DATA
NLDAS vs RNOAA Difference Graphs
Example for One Month: July 2017
Example for Total Monthly Precipitation 2017
We ran into a couple land segments with missing data:
The problem seems to be that some of the grid data didn't finish downloading in 2008. We are currently redownloading and updating deq4 with complete timeseries data for both grids and corresponding land segments. The fact that our function caught the missing data is a good sign and we should be able to fix the issue and have all the ET csv files on deq4 by the meeting on Monday.
7/26/2021 Update:
Here is a graph comparing the different potential evapotranspiration method. All of the land segments and years I have graphed show similar trends: Gopal's PET is larger during the summer, Hamon method is smaller during the summer, and Hargreaves-Samani is kind of all over the place
9/20 update on issue that was previously being tracked in https://github.com/HARPgroup/HARParchive/issues/122
The issue regarding missing data we were dealing with over the summer was dealing with two land segments not near the land segments from the new problem we have been discussing. Therefore, the potential for the previous NLDAS2_ASCII_to_LSegs run having used bad grid data is not what caused the bad precip time series data.
Here is the side by side comparison of the same time between the old and new data. There seems to be no pattern from what I can see.
Table:
Plot:
Line plots: One day: Whole timeseries:
Log plots: One day: Whole timeseries:
After reviewing the summary stats, the precip_annual and 90_day_max_precip columns are extremely high. Searching and filtering each land segment by these will be a QA test to run.
Another way of visualizing anomalies in the data: finding the upper and lower quartiles of data set, computing the IQR, if a value is 1.5*IQR it is flagged as an outlier.
Searching through all of the land segment data and flagging for yearly precipitation values greater than 150 inches resulted in 30 years of land segment data. For whatever reason 2008 seems to have been a problem year. However, this is using the data from before we just reran the function: which fixed the 2 land segments we have been looking at. It will be interesting to see if the data is fixed for every single of these land segments now too (this is also a reminder for me to do that tomorrow).
Here is the .txt with year and land segment: FlaggedLsegs.txt
@kylewlowe great outcome ^^. Eagerly anticipating the re-run that you do and see if that fixes many of these.
Update on rerunning of flagged segments function:
All of the 2008 values seemed to have fixed themselves after the re run. However, the two 1985 land segments did not change. We checked the grid data for the corresponding grids to see if the raw meteorological data downloaded wrong for 1985 and found a grid that only goes has data up until June 10th on the 10th hour. The grid is x382y101, which is in both land segments. This was probably a result of grib_to_ascii function not finishing, or the actual raw data downloading from NLDAS not finishing while running over the summer. We will continue to work on figuring out which one of these is the problem and redownload necessary data tomorrow.
Update on Timeseries QA after reimporting data
All data checked out with nothing being overly unusual. Flagged segment txt files for each metric (DPT, PRC, etc.) are located in the /backup/meteorology directory for viewing of individual values.
The number of flagged data points are as follows:
The test values used were the same values used before the database reset. They are as follows:
./a2l_test 1984010100 2020123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv A51800
wdm_pm_one x385y94 1984010100 2020123123 nldas2 harp2021 nldas1221 p20211221
x385y94, multiple, ex: DDPT, x385y94, 1986 , -40.3535233
./g2a_one.bash 1986010100 1986123123 /backup/meteorology /backup/meteorology/out/grid_met_csv x385y94
grid2land.sh 1986010100 1986123123 /backup/meteorology /backup/meteorology/out/grid_met_csv A51800
a2l_one 1984010100 2020123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv A51800
LongTermAvgRNMax /backup/meteorology/out/lseg_csv/1984010100-2020123123 /backup/meteorology/out/lseg_csv/RNMax 1
R Code examine equations.
library("sqldf")
# from fgrep DPT /opt/model/model_meteorology/nldas2/NLDAS2_ASCII_to_LSegs.cpp
# we get:
# DPT = 237.7 * ( (17.271*TMP/(237.7+TMP)) + log(RHX) ) / (17.271 - ( (17.271*TMP/(237.7+TMP)) + log(RHX) ));
# read temp
tmp <- read.table("out/grid_met_csv/1986/x385y94zTT.txt");
# read rh
rh <- read.table("out/grid_met_csv/1986/x385y94zRH.txt");
names(rh) <- c('year','mo','da','hr','value')
names(tmp) <- c('year','mo','da','hr','value')
dpt <- sqldf(
"
select a.year, a.mo, a.da, a.value as temp, b.value as rh
from tmp as a
left outer join rh as b
on (
a.year = b.year
and a.mo = b.mo
and a.da = b.da
and a.hr = b.hr
)
")
dpt$dpt <- 237.7 * ( (17.271*dpt$temp/(237.7+dpt$temp)) + log(dpt$rh) ) / (17.271 - ( (17.271*dpt$temp/(237.7+dpt$temp)) + log(dpt$rh) ))
quantile(dpt$dpt)
0% 25% 50% 75% 100%
-19.296831 3.598815 11.937289 19.029749 25.111637
R Examine Data.
rad <- read.csv('/opt/model/p53/p532c-sova/input/unformatted/nldas2/harp2021/1984010100-2020123123/A51800.RAD')
names(rad) <- c('yr', 'mo', 'da', 'hr', 'value')
quantile(rad$value)
0% 25% 50% 75% 100%
0.0000 0.0000 0.0925 29.4794 92.0369
./nldas_land_cells N51053
a2l_test =
will create teh CSV files for each cell):
sdate=1984010100
edate=2022123123
./a2l_test $sdate $edate /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv N51053
PROBLEM ERROR Hourly data outside valid range
, and a summary error for a year like HPET, x379y95, 2021 , 9657.27148
with message PROBLEM ERROR Annual data out of range
, you need to try regenerate the grid from NLDAS2 data../g2a_one.bash 2021010100 2021123123 /backup/meteorology /backup/meteorology/out/grid_met_csv x379y95
LongTermAvgRNMax /backup/meteorology/out/lseg_csv/${sdate}-${edate} /backup/meteorology/out/lseg_csv/RNMax 1 x379y95
nano out/grid_met_csv/2021/x380y99zET.txt
grid2land.sh
grid2land.sh 2021010100 2021123123 /backup/meteorology /backup/meteorology/out/grid_met_csv N51053
a2l_one 1984010100 2022123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv N51053
cd /opt/model/p6/vadeq/
LongTermAvgRNMax /backup/meteorology/out/lseg_csv/1984010100-2022123123 /backup/meteorology/out/lseg_csv/RNMax 1 N51053
wdm_pm_one N51053 1984010100 2022123123 nldas2 harp2021 nldas1221 p20211221
# shows
PROBLEM ERROR Hourly data outside valid range
data= 10.4600000
PROBLEM ERROR Hourly data outside valid range
data= 10.3100004
...
HPET, x379y95, 2021 , 9657.27148 PROBLEM ERROR Annual data out of range
Still had a problem, some grid cells fixed, others not.
N51029, N51135, N51049, N51011
./get_nldas_to_date 2021 49 1
g2a_one.bash
for each cell?./g2a_one.bash 2021010100 2021123123 /backup/meteorology /backup/meteorology/out/grid_met_csv x377y96
Batch process:
basin=JA5_7480_0001
segs=`cbp get_landsegs $basin`
badstart=2022010100
badend=2022123123
dstart=1984010100
dend=2022123123
i=N51049
./nldas_land_grids $i
# 10 x377y96 x375y97 x376y97 x377y97 x375y98 x376y98 x377y98 x378y98 x375y99 x376y99
# update all the grid cell CSVs in the land segment
grid2land.sh $badstart $badend /backup/meteorology /backup/meteorology/out/grid_met_csv $i
# just update a single cell:
# ./g2a_one.bash $badstart $badend /backup/meteorology /backup/meteorology/out/grid_met_csv x376y99
# weight all grid cells into the land segment
a2l_one 1984010100 2022123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv $i
LongTermAvgRNMax /backup/meteorology/out/lseg_csv/${dstart}-${dend} /backup/meteorology/out/lseg_csv/RNMax 1 $i
wdm_pm_one $i $dstart $dend nldas2 harp2021 nldas1221 p20211221
nano out/grid_met_csv/2022/x376y97zET.txt
nano out/grid_met_csv/2022/x376y97zET.txt
Big questions:
nldas_datasets
: om-model-info/6863472/dh_properties1984010100-2020123123
: om-model-info/6863473/dh_propertiesRscript R/lseg_qa_test_timeseries.R hydrocode dataset ftype model_code
Rscript R/lseg_qa_test_timeseries.R A37135 1984010100-2020123123 cbp532_landseg cbp-5.3.2
nldas_feature_dataset_prop()
DDPT, x385y94, 1986 , -40.3535233
DDPT, x386y94, 1986 , -40.3506203
DDPT, x386y95, 1986 , -40.3687019
DDPT, x387y95, 1986 , -40.2827759
DDPT, x388y95, 1986 , -40.0947647
DDPT, x388y96, 1986 , -40.0809441
DDPT, x389y96, 1986 , -39.9123192
QA Scripts/Code Samples
Find -9999 in any file in downloaded and parsed grid cell data