Open jsadler2 opened 3 years ago
Any ideas why this is happening?
It doesn't seem like these are the values in NWIS 🤔 :
Strange. looks like the negative discharges might be offset?
bad = dplyr::filter(d, discharge_cms < -1000)
plot(bad$discharge_cms~as.Date(bad$date), type = 'o')
Hey Jeff - trying to track down more info here. Maybe this is an out of date file? I can't find where it's being created in the pipeline. The flow files are now generated from the national flow pull and then subsetted to the DRB here. This is a good reminder (for myself) to periodically clean up the google drive associated with the project.
@limnoliver - obs_flow_full.csv
is built here which depends on what you linked to above. I just built obs_flow_drb.rds
and I see the same bad data.
Thanks Jeff! No wonder I couldn't find it in 2_observations
. Will investigate!
Okay, issue partially figured out. My first clue was that the site ID was listed twice, which means there were two unique values on that day, and the data were being aggregated in some way (happening here).
Some site-parameter code combos return multiple columns when you retrieve from NWIS. This site, for example, when you pull using data retrieval, looks like this:
test <- dataRetrieval::readNWISdv(siteNumbers = '01465500', parameterCd = '00060')
...which likely means discharge is being measured at two locations at the site. Usually in the national temperature pipeline pulls, I pick the "best" column by choosing the column with the most data when I have to (e.g., when there are more than one observation at that site-day). My guess is that we didn't handle this in the national flow pipeline, and so both columns were being passed and then averaged. In theory, I think this is okay, except for the fact that one of those columns had some -999999.0
values, which I assume is an error code.
The weird part is that these -999999.0
values exist in the national pull data (from 2_observations/in/daily_flow.rds
) but I can't recreate them from the above NWIS pull. Maybe they were fixed sometime between the national flow pull (~10 months ago) and now?
And just confirming, this appears to be what's happening in the flow pipeline - note here the column selection part is commented out, and then col_name is being dropped when data from uv
and dv
are bound together.
The weird part is that these -999999.0 values exist in the national pull data (from 2_observations/in/daily_flow.rds) but I can't recreate them from the above NWIS pull. Maybe they were fixed sometime between the national flow pull (~10 months ago) and now?
That is weird. It's kind of comforting that there aren't those values, but also not since now it's a phantom problem.
For my postdoc on metabolism estimation, we re-pulled input data from NWIS about a year after the initial pull and saw groups of sites where whole sections of data changed - one change I remember seemed to have to do with correcting a timezone issue, and I think there were also cases where data that had initially been available but weird were taken off NWIS entirely. So I'm not surprised that there might be similar cases in the discharge data for our current projects.
I found some really wonky flow observations in
obs_flow_full.csv
. There are 108 observations that have ~-14150
as the value: