DistrictDataLabs / yellowbrick

Visual analysis and diagnostic tools to facilitate machine learning model selection.
http://www.scikit-yb.org/
Apache License 2.0
4.3k stars 559 forks source link

Implement Detection Error Tradeoff Curves (DET) Visualizer #453

Closed ndanielsen closed 4 years ago

ndanielsen commented 6 years ago

A community suggestion from reddit, implement a DET curve visualizer for model comparison.

From wikipedia:

A detection error tradeoff (DET) graph is a graphical plot of error rates for binary classification systems, plotting the false rejection rate vs. false acceptance rate.[1] The x- and y-axes are scaled non-linearly by their standard normal deviates (or just by logarithmic transformation), yielding tradeoff curves that are more linear than ROC curves, and use most of the image area to highlight the differences of importance in the critical operating region.

Aspiration Wikipedia image comparing multiple models:

440px-example_of_det_curves

Sample Code for plotting one model on a mpt object. Reference link below:

from matplotlib import pyplot as plt
def DETCurve(fps,fns):
    """
    Given false positive and false negative rates, produce a DET Curve.
    The false positive rate is assumed to be increasing while the false
    negative rate is assumed to be decreasing.
    """
    axis_min = min(fps[0],fns[-1])
    fig,ax = plt.subplots()
    plot(fps,fns)
    yscale('log')
    xscale('log')
    ticks_to_use = [0.001,0.002,0.005,0.01,0.02,0.05,0.1,0.2,0.5,1,2,5,10,20,50]
    ax.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
    ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
    ax.set_xticks(ticks_to_use)
    ax.set_yticks(ticks_to_use)
    axis([0.001,50,0.001,50])

Suggested Visualizer Interface

models = [LogisticRegression(), AnotherModel(), DifferentModel()]
viz = DETCurve(models)
viz.fit(X, y)
viz.poof()

Sample Code Reference Link https://jeremykarnowski.wordpress.com/2015/08/07/detection-error-tradeoff-det-curves/

Source Reddit comment https://www.reddit.com/r/MachineLearning/comments/8mbif5/news_new_release_of_python_ml_visualization/dzpford/

bbengfort commented 6 years ago

Wanted to add a paper reference here, which should be included in the docs when implemented:

Martin, Alvin, George Doddington, Terri Kamm, Mark Ordowski, and Mark Przybocki. 1997. “The DET Curve in Assessment of Detection Task Performance.” National Inst of Standards and Technology Gaithersburg MD.

And another reference to a Matlab toolkit for DET curves: https://sites.google.com/site/nikobrummer/

Thanks, @ndanielsen for a very thorough feature issue!

rebeccabilbro commented 4 years ago

Going to archive this one for now since the conversation has gone a bit stale; happy to reopen if someone has bandwidth and interest!