Closed christophertubbs closed 2 years ago
Agreed @christophertubbs. I think an ROC chart would be really nice.
Area-under-curve is something that would be super useful for diagnostic/descriptive purposes.
Can you be more specific? Is this AUC for any arbitrary function or AUC for a single valued time-series, something else?
How does the interface look? What's the expected return value?
from hydrotools.metrics import metrics
AUC = metrics.area_under_curve(
parameter_1=something,
parameter_2=something_else
)
I think we're looking at:
from hydrotools.metrics import metrics
x = [0, 1, 3, 5, 6, 7, 8, 9, 10]
y = [13.7, 14.5, 8.7, 6.66, 5.8, 3.85, 10.1, 11, 19.8]
AUC = metrics.area_under_curve(x, y)
print(AUC)
96.64
It'll be the auc for a single valued time series. I think? I have a little more research to do to figure out all that we'll need to do with it. At this point I mostly just know that we need it and people get ornery if I break out sklearn. I think we do/will need the roc chart, but it hasn't been brought up yet.
Now, what would be REAL cool is if we could get an array back instead of a single value, so we'd get the AUC for x[:2], x[:3], x[:4], etc, so we can see the change in auc over time. That's more of a "Neat! How can we use that?" idea, though.
OK, this looks like numerical integration. I didn't do the math, but it looks like you're using a typical LHS (left-handed) Riemann Sum to compute the "area under the curve" of the function y(x). This is doable, but will require a little bit of design to allow for different quadrature rules. We'll also want to check for basic continuity/smoothness (i.e. the independent variable needs to be uniquely valued and increasing). We'll also need checks at the ends of the interval. I think the example you posted might need one more x-value to compute the last rectangle.
I'd be surprised if there isn't already a library that does this, but if not I don't have a problem adding this functionality.
sklearn learn does it :D
That's where I got all the math.
get the AUC for x[:2], x[:3], x[:4], etc, so we can see the change in auc over time
Derivatives, derivatives, derivatives, derivatives.....Remember, don't drink and derive.
Do I look like I remember anything from math in college?
sklearn learn does it :D
That's where I got all the math.
scikit-learn already has these methods. I don't think we should be reinventing the wheel. One of the motivations of hydrotools
being a Python package suite was potential interoperability with popular well-established scientific computing libraries. The hydrotools.metrics
package re-implements some basic metrics for convenience, but the vast majority of the metrics in this package are specific to hydrological evaluation and at the time of development weren't known to exist in other well-maintained modern packages. The interfaces were also modeled after similar metrics methods in sklearn and scipy. So, my vote is to use sklearn.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html
Alrighty, closing the issue.
Is it worth wrapping the method under the hydrotools package to ensure a uniform interface? If not, then it is probably worth some documentation on how to use scikit/scipy with hydrotools data in conjunction with methods provided by the package.
Is it worth wrapping the method under the hydrotools package to ensure a uniform interface? If not, then it is probably worth some documentation on how to use scikit/scipy with hydrotools data in conjunction with methods provided by the package.
Yes, we've discussed adding some example workflows to the repository documentation. An annotated notebook demonstrating how to combine nwis_client
, nwm_client
, and sklearn
is a good idea.
Area-under-curve is something that would be super useful for diagnostic/descriptive purposes.