USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Metric Selection and Normalization #181

Closed slevin75 closed 1 year ago

slevin75 commented 1 year ago

Discussed in https://github.com/USGS-R/regional-hydrologic-forcings-ml/discussions/14

Originally posted by **jds485** November 18, 2021 **Metric Selection** *log-transform Issues* Some of the EflowStats metrics use log transformations and will not work with zeros in the record. Some of these are also confusing to interpret and may not be useful for this project. I think these can be omitted to avoid problems for such sites: DH22, DH23, DH24, FH11, MA4, MA9, MA10, MA11, MH18, MH19, TA1, TA2, TA3, TL3, TL4, TH3, RA6, RA7 The `floodThreshold` argument can be set to NULL because it also relies on a log transform to compute. As far as I can tell, the only metrics that rely on this threshold are in the above list, so I don't think we need an alternative method of computing the threshold. If we do use the threshold, we may want to increase the return period (current is the 1.67 year flood). *Monthly and Annual Metrics* Are annual and monthly flows useful for this project? I think we can omit these and focus on the metrics for daily flows and annual maximums. *Metrics not in EflowStats* To predict the flow duration curve, we will need to add additional metrics for % exceedances. EflowStats has 50, 75, 90, and 99th percentiles. Maybe at 5% or 10% intervals? Applies to flood pulses (e.g., FH5), durations (e.g., DH17), and magnitudes (e.g., MH15). **Normalization** Dividing by drainage area (DA) improved predictions for metrics that Eng et al. (2017). MA41, ML22, and MH20 are already normalized by DA. I've made a list of metrics for which DA normalization could make sense: MA1, MA2, MA12, MA13, MA14, MA15, MA16, MA17, MA18, MA19, MA20, MA21, MA22, MA23, DH1, DH2, DH3, DH4, DH5, RA1, RA3 Some metrics are divided by the mean or median annual discharge. I think these can be un-normalized by the mean or median and then normalized by DA. ML17, ML19, MH14, MH15, MH16, MH17, MH21, MH22, MH23, MH24, MH25, MH26, MH27
slevin75 commented 1 year ago

slevin75 on Nov 22, 2021 Maintainer I agree with avoiding a lot of those in the first list although DH23 is on my short list of keepers even though it uses the flood threshold. That 1.67 yr threshold is, I think, a surragate for bankfull flow and I know that is used by a lot of sediment folks as a critical flow threshold for sediment transport. I think the duration above that threshold is what they used in that recent Gervasi paper that we had talked about a couple weeks ago. The only place this might be an issue is if the peak flow values had zeros. I don't know how much of an issue this is in our area - a stream would have to be dry for an entire year to have a zero peak flow value. There are some other high flow durations in there that use quantile thresholds instead of the flood threshold and I don't know how that would compare to the flood durations and if is meaningful for sediment dynamics.

I am a little overwhelmed by the sheer number of potential metrics here and I don't feel like I know enough about sediment transport to know which ones would be the most meaningful. How many do we want on our final list to predict? I wonder if it would be worth getting some input from a geomorphologist to help narrow down our short list? I work with Faith Fitzpatrick a bit and I feel like she would be very insightful into which metrics are worth our time.

slevin75 commented 1 year ago

jds485 on Nov 22, 2021 Maintainer Author For DH23, see comment here about the durations being cut off at the beginning and end of the water / calendar year.

Yes, it is a lot of metrics. Even removing these metrics, we would have ~50 to evaluate. I am not worried about computation time to train models for that many metrics, but it would be helpful to make sure we've captured metrics that are meaningful for geomorphology, and that we're not missing important ones. We can discuss in our meeting today if/when we should schedule a call with Faith or others.

slevin75 commented 1 year ago

slevin75 on Nov 22, 2021 Maintainer DH 23/24 are not defined as pulse (event) durations, they are just the mean number of days per year above the threshold so I feel like it is ok if it is cutting through an event. Does seem weird to do that for DH22 though.

slevin75 commented 1 year ago

jds485 on Nov 22, 2021 Maintainer Author That's a good catch - I agree that's less of a concern for DH23/24. Exceedances may not be independent from one year to the next (peaks over threshold problem), but I think it's okay to use these metrics for long-term averages

slevin75 commented 1 year ago

cstillwellusgs on Nov 22, 2021 Maintainer Will we be able to talk through the metrics during today's meeting? I haven't had a chance to look yet but I think it would be good to brainstorm together.

Charlie 1 reply @jds485 Comment options jds485 on Nov 22, 2021 Maintainer Author Yes, I've blocked off time for us to discuss today