HARPgroup / HARParchive

This repo houses HARP code development items, resources, and intermediate work products.
1 stars 0 forks source link

Weather Data QA #72

Open rburghol opened 3 years ago

rburghol commented 3 years ago

Big questions:

QA Scripts/Code Samples

Find -9999 in any file in downloaded and parsed grid cell data

cd /backup/meteorology
fgrep -R "-9999" ./out/grid_met_csv/*
alexwlowe commented 3 years ago

Big Questions Answers

rburghol commented 3 years ago
alexwlowe commented 3 years ago

Hey Rob,

rburghol commented 3 years ago

Excellent - thanks for the update! Glad to see I misinterpreted the precip status!

alexwlowe commented 3 years ago

http://deq1.bse.vt.edu:81/met/ - web address for /backup/meteorology directory

alexwlowe commented 3 years ago

Various docs/resources that we have used for QA

katealbi11 commented 3 years ago

Helpful Links for Null Values [here ](https://www1.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt open-data-docs/docs/noaa/noaa-ghcn at main · awslabs/open-data-docs (github.com))

alexwlowe commented 3 years ago

7/15/2021 Update

When batch running the land segments we discovered that we had missed some grids when using the NLDAS2_GRIB_to_ASCII function. The grids that we were missing are grids that are not inside VA's minor basins, however they are inside land segments that are partially in VA's minor basins.

All of the missing grids are currently being extracted from 1984-2020 (I used the handy nohup trick Rob showed us yesterday so it should run all night even when I get signed out of deq4). I will probably login for a couple of min tomorrow when the grids are finished downloading and batch run the NLDAS2_ASCII_to_LSegs so that on Monday we should be able to begin implementing our PET calculations with all of our data!

UPDATE ON MISSING VALUES IN DATA

katealbi11 commented 3 years ago

NLDAS vs RNOAA Difference Graphs

Example for One Month: July 2017

Screen Shot 2021-07-13 at 11 59 32 AM

Example for Total Monthly Precipitation 2017

Screen Shot 2021-07-14 at 11 11 34 AM
kylewlowe commented 3 years ago

We ran into a couple land segments with missing data:

The problem seems to be that some of the grid data didn't finish downloading in 2008. We are currently redownloading and updating deq4 with complete timeseries data for both grids and corresponding land segments. The fact that our function caught the missing data is a good sign and we should be able to fix the issue and have all the ET csv files on deq4 by the meeting on Monday.

7/26/2021 Update:

alexwlowe commented 3 years ago

image Here is a graph comparing the different potential evapotranspiration method. All of the land segments and years I have graphed show similar trends: Gopal's PET is larger during the summer, Hamon method is smaller during the summer, and Hargreaves-Samani is kind of all over the place

kylewlowe commented 2 years ago

9/20 update on issue that was previously being tracked in https://github.com/HARPgroup/HARParchive/issues/122

The issue regarding missing data we were dealing with over the summer was dealing with two land segments not near the land segments from the new problem we have been discussing. Therefore, the potential for the previous NLDAS2_ASCII_to_LSegs run having used bad grid data is not what caused the bad precip time series data.

Here is the side by side comparison of the same time between the old and new data. There seems to be no pattern from what I can see.

Table: Screenshot (124)

Plot:

Line plots: One day: image Whole timeseries: image

Log plots: One day: image Whole timeseries: image

After reviewing the summary stats, the precip_annual and 90_day_max_precip columns are extremely high. Searching and filtering each land segment by these will be a QA test to run.

katealbi11 commented 2 years ago

Another way of visualizing anomalies in the data: finding the upper and lower quartiles of data set, computing the IQR, if a value is 1.5*IQR it is flagged as an outlier.

kylewlowe commented 2 years ago

Searching through all of the land segment data and flagging for yearly precipitation values greater than 150 inches resulted in 30 years of land segment data. For whatever reason 2008 seems to have been a problem year. However, this is using the data from before we just reran the function: which fixed the 2 land segments we have been looking at. It will be interesting to see if the data is fixed for every single of these land segments now too (this is also a reminder for me to do that tomorrow).

Here is the .txt with year and land segment: FlaggedLsegs.txt

rburghol commented 2 years ago

@kylewlowe great outcome ^^. Eagerly anticipating the re-run that you do and see if that fixes many of these.

kylewlowe commented 2 years ago

Update on rerunning of flagged segments function:

All of the 2008 values seemed to have fixed themselves after the re run. However, the two 1985 land segments did not change. We checked the grid data for the corresponding grids to see if the raw meteorological data downloaded wrong for 1985 and found a grid that only goes has data up until June 10th on the 10th hour. The grid is x382y101, which is in both land segments. This was probably a result of grib_to_ascii function not finishing, or the actual raw data downloading from NLDAS not finishing while running over the summer. We will continue to work on figuring out which one of these is the problem and redownload necessary data tomorrow.

kylewlowe commented 2 years ago

Update on Timeseries QA after reimporting data

All data checked out with nothing being overly unusual. Flagged segment txt files for each metric (DPT, PRC, etc.) are located in the /backup/meteorology directory for viewing of individual values.

The number of flagged data points are as follows:

The test values used were the same values used before the database reset. They are as follows:

rburghol commented 2 years ago

R Code examine equations.

library("sqldf")
# from fgrep DPT /opt/model/model_meteorology/nldas2/NLDAS2_ASCII_to_LSegs.cpp
# we get:
# DPT = 237.7 * ( (17.271*TMP/(237.7+TMP)) + log(RHX) ) / (17.271 - ( (17.271*TMP/(237.7+TMP)) + log(RHX) ));

# read temp
tmp <- read.table("out/grid_met_csv/1986/x385y94zTT.txt");  
# read rh
rh <- read.table("out/grid_met_csv/1986/x385y94zRH.txt");
names(rh) <- c('year','mo','da','hr','value')
names(tmp) <- c('year','mo','da','hr','value')

dpt <- sqldf(
  "
    select a.year, a.mo, a.da, a.value as temp, b.value as rh
    from tmp as a 
    left outer join rh as b 
   on (
      a.year = b.year
      and a.mo = b.mo
      and a.da = b.da
      and a.hr = b.hr
    )
")

dpt$dpt <- 237.7 * ( (17.271*dpt$temp/(237.7+dpt$temp)) + log(dpt$rh) ) / (17.271 - ( (17.271*dpt$temp/(237.7+dpt$temp)) + log(dpt$rh) ))

quantile(dpt$dpt)

        0%        25%        50%        75%       100%
-19.296831   3.598815  11.937289  19.029749  25.111637

R Examine Data.

rad <- read.csv('/opt/model/p53/p532c-sova/input/unformatted/nldas2/harp2021/1984010100-2020123123/A51800.RAD')
names(rad) <- c('yr', 'mo', 'da', 'hr', 'value')
quantile(rad$value)
     0%     25%     50%     75%    100%
 0.0000  0.0000  0.0925 29.4794 92.0369
rburghol commented 1 year ago

Details on error

then finally , a huge number for the summary annual HPET in 2021

HPET, x379y95, 2021 , 9657.27148 PROBLEM ERROR Annual data out of range

rburghol commented 1 year ago

Still had a problem, some grid cells fixed, others not.

Batch process:

basin=JA5_7480_0001
segs=`cbp get_landsegs $basin`

badstart=2022010100 
badend=2022123123
dstart=1984010100
dend=2022123123 
i=N51049
./nldas_land_grids $i
# 10 x377y96 x375y97 x376y97 x377y97 x375y98 x376y98 x377y98 x378y98 x375y99 x376y99
# update all the grid cell CSVs in the land segment
grid2land.sh $badstart $badend /backup/meteorology /backup/meteorology/out/grid_met_csv $i
   # just update a single cell:
   # ./g2a_one.bash $badstart $badend /backup/meteorology /backup/meteorology/out/grid_met_csv x376y99
# weight all grid cells into the land segment
a2l_one 1984010100 2022123123 /backup/meteorology/out/grid_met_csv /backup/meteorology/out/lseg_csv  $i

LongTermAvgRNMax /backup/meteorology/out/lseg_csv/${dstart}-${dend} /backup/meteorology/out/lseg_csv/RNMax 1 $i
wdm_pm_one $i $dstart $dend nldas2 harp2021 nldas1221 p20211221