Open aappling-usgs opened 5 years ago
Another note to add - we've talked about propagating the monitoring agency code through to the final munged dataset, but perhaps it would be useful to use the monitoring location id instead. Reason being, from that ID we can get the agency code, and also, with that id we could (if needed) run some analyses that could exclude littoral sites or tell us that some monitoring locations within a lake have much worse error than others.
Some/all of the following QC may already be in place; check, consider, and implement as needed. See https://github.com/USGS-CIDA/lake-temperature-neural-networks/issues/4 for issue origin and more notes (mostly copied here):
There are a lot of observations that are physically inconsistent as far as monotonically increasing density. What, if anything, do we want to do to remove them? And how do you know which of the observations is the bad one (it's not necessarily the first one that violates monotonicity going from top to bottom)?
we'll see a lot of these inconsistencies arise when we are merging two different sources of temperature measurements. E.g., a hand profile vs a buoy
We'll also see these issues pop up when the monitoring locations are very different. E.g., one in the middle of the lake ("deep hole" or deepest spot in the lake), vs "littoral" (close to the shore). These aren't necessarily errors, they just show us flaws in the assumption that lake temperatures can be represented by a one-dimensional model
Even very precise thermistors on buoys can have offsets from one another. I don't have a good example plot of this handy, but I think Mendota has a buoy that takes one-minute measurements at 27 depths. If you look at those as a time series line plot in the late fall (when the lake is cooling and it is well mixed), you'll see really really high coherence in the data, but slight offsets at different depths. I've used this fall pattern to "solve" for the thermistor offsets (as they are pretty constant and do not really drift) and then apply to the rest of the dataset
some of these are real, and the result of temporary instabilities in water column that are captured from higher (temporal) resolution observations. If we have hourly data, do we grab the noon profile value? Do we summarize to a daily temperature? If we do the former, there is a good chance we'll catch a few of these instabilities.