amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.
http://keystone-ml.org/
Apache License 2.0
470 stars 117 forks source link

Do we have a way to calculate AUC for binary classifier? #288

Closed shouhengyi-microsoft closed 7 years ago

shouhengyi-microsoft commented 8 years ago

Hi all,

I've been reading BinaryClassificationMetrics [http://keystone-ml.org/api/latest/#evaluation.BinaryClassificationMetrics], but the AUC is missing. I'm wondering what is the best thing I can do if I want to calculate AUC for binary classification problems.

Thanks.

shivaram commented 8 years ago

The binary classifier we have right now is a simple class that doesn't do things which need multiple passes over the data. I think the easiest thing to do if you need more complex metrics is to feed the output from Keystone's model to the BinaryClassificationMetrics in Spark. The code would look something like

val testActual = ... // Create actual labels as an RDD[Double] 
val predictor = ... andThen NaiveBayesEstimator(...)
val predictions = predictor(testData).get // This is RDD[Double] with score for each example
val metrics = new org.apache.spark.mllib.evaluation.BinaryClassificationMetrics(
    predictions.zip(testActual))