NOAA-OWP / hydrotools

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.
Other
53 stars 12 forks source link

Implement AUC #166

Closed christophertubbs closed 2 years ago

christophertubbs commented 2 years ago

Area-under-curve is something that would be super useful for diagnostic/descriptive purposes.

aaraney commented 2 years ago

Agreed @christophertubbs. I think an ROC chart would be really nice.

jarq6c commented 2 years ago

Area-under-curve is something that would be super useful for diagnostic/descriptive purposes.

Can you be more specific? Is this AUC for any arbitrary function or AUC for a single valued time-series, something else?

How does the interface look? What's the expected return value?

from hydrotools.metrics import metrics

AUC = metrics.area_under_curve(
    parameter_1=something,
    parameter_2=something_else
)
christophertubbs commented 2 years ago

I think we're looking at:

from hydrotools.metrics import metrics
x = [0, 1, 3, 5, 6, 7, 8, 9, 10]
y = [13.7, 14.5, 8.7, 6.66, 5.8, 3.85, 10.1, 11, 19.8]

AUC = metrics.area_under_curve(x, y)

print(AUC)

96.64

It'll be the auc for a single valued time series. I think? I have a little more research to do to figure out all that we'll need to do with it. At this point I mostly just know that we need it and people get ornery if I break out sklearn. I think we do/will need the roc chart, but it hasn't been brought up yet.

Now, what would be REAL cool is if we could get an array back instead of a single value, so we'd get the AUC for x[:2], x[:3], x[:4], etc, so we can see the change in auc over time. That's more of a "Neat! How can we use that?" idea, though.

jarq6c commented 2 years ago

OK, this looks like numerical integration. I didn't do the math, but it looks like you're using a typical LHS (left-handed) Riemann Sum to compute the "area under the curve" of the function y(x). This is doable, but will require a little bit of design to allow for different quadrature rules. We'll also want to check for basic continuity/smoothness (i.e. the independent variable needs to be uniquely valued and increasing). We'll also need checks at the ends of the interval. I think the example you posted might need one more x-value to compute the last rectangle.

I'd be surprised if there isn't already a library that does this, but if not I don't have a problem adding this functionality.

christophertubbs commented 2 years ago

sklearn learn does it :D

That's where I got all the math.

hellkite500 commented 2 years ago

get the AUC for x[:2], x[:3], x[:4], etc, so we can see the change in auc over time

Derivatives, derivatives, derivatives, derivatives.....Remember, don't drink and derive.

christophertubbs commented 2 years ago

Do I look like I remember anything from math in college?

jarq6c commented 2 years ago

sklearn learn does it :D

That's where I got all the math.

scikit-learn already has these methods. I don't think we should be reinventing the wheel. One of the motivations of hydrotools being a Python package suite was potential interoperability with popular well-established scientific computing libraries. The hydrotools.metrics package re-implements some basic metrics for convenience, but the vast majority of the metrics in this package are specific to hydrological evaluation and at the time of development weren't known to exist in other well-maintained modern packages. The interfaces were also modeled after similar metrics methods in sklearn and scipy. So, my vote is to use sklearn.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.auc.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html

christophertubbs commented 2 years ago

Alrighty, closing the issue.

hellkite500 commented 2 years ago

Is it worth wrapping the method under the hydrotools package to ensure a uniform interface? If not, then it is probably worth some documentation on how to use scikit/scipy with hydrotools data in conjunction with methods provided by the package.

jarq6c commented 2 years ago

Is it worth wrapping the method under the hydrotools package to ensure a uniform interface? If not, then it is probably worth some documentation on how to use scikit/scipy with hydrotools data in conjunction with methods provided by the package.

Yes, we've discussed adding some example workflows to the repository documentation. An annotated notebook demonstrating how to combine nwis_client, nwm_client, and sklearn is a good idea.