Closed lekoenig closed 2 years ago
River-dl is set up to read the temp observations in as a zarr, with dimensions of date and seg_id_nat (or date and COMID in this case). I have a script that converts the files from the data release into this format, but if we can do it in your script then that saves a step. One possible sticking point is having the date be in numpy datetime format. I'm guessing that's possible within the R script, but may be something we need to play around with. Also, it looks like the training observations have every date and every segment (though most of those observed temps are NA)
The unaggregated temperature observations come from the 2022 forecasting release. In https://github.com/USGS-R/drb-gw-hw-model-prep/pull/32/commits/d13a122828d82d933a02646bfe7f20f1fa34352a I've added code changes to reduce duplicate observations and summarize the temperature data to get a single value per COMID-date, which we'll use for model evaluation. These code changes call a new function munge_split_temp_dat()
which I've adapted from USGS-R/delaware-model-prep to summarize the data by COMID rather than NHM segment.
The output is a zarr data store that is indexed by 'date'
and 'COMID'
. The new file drb_temp_observations_nhdv2.zarr
is now on caldera.
can we change the column name of "mean_temp_c" to "temp_c" to be consistent with the NHM resolution temperature observations?
I'm going to go ahead and merge this PR but I've opened a new issue to address the preferred column names for the temperature observations that get aggregated at the NHDPlusv2-scale (#38).
This PR adds a new target (
p2_drb_temp_obs_w_segs
) and outputs a data frame with the observational time series but with COMID as an additional column. The output file is currently a csv file. Should other formatting changes be made to input to river-dl?Closes #27