dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.03k stars 1.89k forks source link

Cannot set the threshold on a binary predictor #2465

Closed rogancarr closed 5 years ago

rogancarr commented 5 years ago

It is no longer possible to set a custom Threshold and ThresholdColumn on a binary classifier.

Previously, we had been using BinaryPredictionTransformer. Recently, BinaryPredictionTransformer was marked as internal and is no longer available for usage outside of the library.

Related to question #403

rogancarr commented 5 years ago

One suggestion is to make the Threshold and ThresholdColumn properties of the prediction transform settable. They are currently only gettable.

Something like this?

var model = mlContext.BinaryClassification.Trainers.LogisticRegression().Fit(train);
var predictor = model.LastTransformer;
predictor.Threshold = 0.01;
// or
predictor.SetThreshold(0.01);
predictor.SetThresholdColumn("Bananas");
Ivanidzo4ka commented 5 years ago

https://github.com/dotnet/machinelearning/blob/master/test/Microsoft.ML.Tests/Scenarios/Api/Estimators/ReconfigurablePrediction.cs

rogancarr commented 5 years ago

@Ivanidzo4ka I am pulling these out of Tests/ and into a new Functional.Tests that is does not have InternalsVisibleTo available in accordance with #2306 .

This test relies on internal APIs, so it doesn't work from and end-user perspective.

eerhardt commented 5 years ago

YAY! The first value of these tests is now realized. 😄

rogancarr commented 5 years ago

For anyone looking for this functionality, there is a workaround discussed in #2645.

eerhardt commented 5 years ago

Note that we are doing this exact scenario in our samples:

https://github.com/dotnet/machinelearning-samples/blob/73803ce861e16b48edefaad97771bcd721e268d6/samples/csharp/getting-started/BinaryClassification_SpamDetection/SpamDetectionConsoleApp/Program.cs#L61-L66

            // The dataset we have is skewed, as there are many more non-spam messages than spam messages.
            // While our model is relatively good at detecting the difference, this skewness leads it to always
            // say the message is not spam. We deal with this by lowering the threshold of the predictor. In reality,
            // it is useful to look at the precision-recall curve to identify the best possible threshold.
            var inPipe = new TransformerChain<ITransformer>(model.Take(model.Count() - 1).ToArray());
            var lastTransformer = new BinaryPredictionTransformer<IPredictorProducing<float>>(mlContext, model.LastTransformer.Model, inPipe.GetOutputSchema(data.Schema), model.LastTransformer.FeatureColumn, threshold: 0.15f, thresholdColumn: DefaultColumnNames.Probability);

@CESARDELATORRE

rogancarr commented 5 years ago

@eerhardt This is broken as of 0.11.

eerhardt commented 5 years ago

I think we need to fix it. This is a “take back” in functionality that we said we needed for v1.

rogancarr commented 5 years ago

I've added these back to Project 13. Let's discuss these in scrum today and update the issues once we have made a group decision.

rogancarr commented 5 years ago

Discussion from scrum: These will be stretch goals for V1 and will be taken up after the rest of the Project 13 issues are exhausted. The justification is that these are rather advanced options, and they are technically possible to implement without helper APIs (see the workaround proposed here).

Ivanidzo4ka commented 5 years ago

Since I'm working on this. We technically have two thresholds. One during metric calculation, and one during prediction time.

var model = pipeline.Fit(data);
var newModel = mlContext.BinaryClassification.ChangeThreshold(model, -5f);
var engine = model.CreatePredictionEngine<TweetSentiment, Answer>(mlContext);
var newEngine = newModel.CreatePredictionEngine<TweetSentiment, Answer>(mlContext);
var pr = engine.Predict(new TweetSentiment() { SentimentText = "Good Bad job" });
Assert.True(pr.PredictedLabel);
pr = newEngine.Predict(new TweetSentiment() { SentimentText = "Good Bad job" });
Assert.False(pr.PredictedLabel);
// Evaluate the model.
var scoredData = model.Transform(data);
var metrics = mlContext.BinaryClassification.Evaluate(scoredData);
var newScoredData = newModel.Transform(data);
var newMetrics = mlContext.BinaryClassification.Evaluate(scoredData);
Assert.True(metrics, newMetrics);

I'm making changes which allow user change threshold, but only during prediction time, during metric calculation they are remain same, because we have Scorer and we have Evaluator and they don't respect each other. Shall they? @rogancarr @TomFinley

Also Evaluator thing let user specify things like:

            [Argument(ArgumentType.AtMostOnce, HelpText = "Probability value for classification thresholding")]
            public Single Threshold;

            [Argument(ArgumentType.AtMostOnce, HelpText = "Use raw score value instead of probability for classification thresholding", ShortName = "useRawScore")]
            public bool UseRawScoreThreshold = true;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for p/r curve generation. Specify 0 for no p/r curve generation", ShortName = "numpr")]
            public int NumRocExamples;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUC calculation. If 0, AUC is not computed. If -1, the whole dataset is used", ShortName = "numauc")]
            public int MaxAucExamples = -1;

            [Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUPRC calculation. Specify 0 for no AUPRC calculation", ShortName = "numauprc")]
            public int NumAuPrcExamples = 100000;

and we expose nothing in our evaluate method.

rogancarr commented 5 years ago

I think it makes sense to keep them different.

Setting the threshold on the predictor actually changes the pipeline, whereas changing the evaluator lets you ask "what if" scenarios.

Plus the Evaluator needs to threshold for AUC and AUPRC anyways, so we (probably) won't be able to toss that code.

I don't hold this opinion very strongly, though.

rogancarr commented 5 years ago

Technically, this issue started by stating that we should be able to set the Threshold and ThresholdColumn properties of the BinaryPredictionTransformer. That said, I think that the actual use cases that we care about are changing the Threshold; the ThresholdColumn seems more important when we are creating a new BinaryPredictionTransformer. I actually don't think we need to modify that property. This issue can be closed simply by allowing a Threshold to be set.

Note that in practice, we have a Score column and a Probability column from our learners; modulo floating point error, these are a 1:1 mapping, so we will get the same results no matter which one we threshold on. Setting a custom threshold column seems more like a backdoor for crazy things: e.g. creating a custom score based on a few models and / or heuristics and then modifying a BinaryPredictionTransformer to score that column instead.