Exploratory Data Analysis for Reach and Gage Features, and Streamflow Metrics

USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.

Creative Commons Zero v1.0 Universal

0 stars 4 forks source link

Exploratory Data Analysis for Reach and Gage Features, and Streamflow Metrics #34

Open jds485 opened 2 years ago

jds485 commented 2 years ago

Explore NHD (reach) attributes, GAGES2.1 (site) attributes, and streamflow metrics

spatial coverage of the attributes in our regions of interest
identification and investigation of rogue/outlier data
visualization of features and flow metrics on maps of reaches for each region (example: avg. monthly precip for all gages in each region)

(future task) Can make feature maps and analyses for all reaches and compare to only those reaches with GAGES2.1 sites. This relates to issue #31.

Covers project Task 3.1: Visualize predictors (watershed attributes) and responses (flow metrics) by making exploratory maps and figures (check data quality, identify errors, etc.)

jds485 commented 2 years ago

Function that can be modified for attribute and metric visualizations: violin plot + map https://github.com/USGS-R/drb-inland-salinity-ml/blob/main/3_visualize/src/plot_nhdv2_attr.R

slevin75 commented 2 years ago

@jds485 I'm getting started on this and have some code to produce the maps and violin plots of the metrics by cluster, but we have a lot of different metric and cluster targets now and I'm not exactly sure which of these I should be using. For metrics we have p1_HIT_metrics, p1_FDC_metrics, p1_FDC_metrics_season, p1_FDC_metrics_season_high, then for clusters targets, we have p3_gages_clusters, p3_gages_clusters_quants, p3_gages_clusters_quants_agg, and p3_gages_clusters_quants_agg_selected. Do we need to look at all of these different clusterings or just the _selected?

jds485 commented 2 years ago

Thanks for checking. p1_FDC_metrics and p1_HIT_metrics are the period of record metrics that we'll predict.

p3_gages_clusters_quants_agg_selected is the target with columns for 5 clusters (_k5 at the end of the column name). We'll use those clusters for model regions.

slevin75 commented 2 years ago

How are we planning on clustering the HIT metrics? They don't have quantiles associated with them like the FDC_metrics.

jds485 commented 2 years ago

Let's use the 0.5-0.7 clusters for 'ma1', 'ml17', and 'ml18'. Most of the other HIT metrics are for 75th percentile flows, and rise/fall rates are not really dependent on quantile so I think the rest of the metrics can be grouped with the 0.75-0.95 clusters.