Closed rogancarr closed 5 years ago
One suggestion is to make the Threshold
and ThresholdColumn
properties of the prediction transform settable. They are currently only gettable.
Something like this?
var model = mlContext.BinaryClassification.Trainers.LogisticRegression().Fit(train);
var predictor = model.LastTransformer;
predictor.Threshold = 0.01;
// or
predictor.SetThreshold(0.01);
predictor.SetThresholdColumn("Bananas");
@Ivanidzo4ka I am pulling these out of Tests/ and into a new Functional.Tests that is does not have InternalsVisibleTo
available in accordance with #2306 .
This test relies on internal
APIs, so it doesn't work from and end-user perspective.
YAY! The first value of these tests is now realized. 😄
For anyone looking for this functionality, there is a workaround discussed in #2645.
Note that we are doing this exact scenario in our samples:
// The dataset we have is skewed, as there are many more non-spam messages than spam messages.
// While our model is relatively good at detecting the difference, this skewness leads it to always
// say the message is not spam. We deal with this by lowering the threshold of the predictor. In reality,
// it is useful to look at the precision-recall curve to identify the best possible threshold.
var inPipe = new TransformerChain<ITransformer>(model.Take(model.Count() - 1).ToArray());
var lastTransformer = new BinaryPredictionTransformer<IPredictorProducing<float>>(mlContext, model.LastTransformer.Model, inPipe.GetOutputSchema(data.Schema), model.LastTransformer.FeatureColumn, threshold: 0.15f, thresholdColumn: DefaultColumnNames.Probability);
@CESARDELATORRE
@eerhardt This is broken as of 0.11.
I think we need to fix it. This is a “take back” in functionality that we said we needed for v1.
I've added these back to Project 13. Let's discuss these in scrum today and update the issues once we have made a group decision.
Discussion from scrum: These will be stretch goals for V1 and will be taken up after the rest of the Project 13 issues are exhausted. The justification is that these are rather advanced options, and they are technically possible to implement without helper APIs (see the workaround proposed here).
Since I'm working on this. We technically have two thresholds. One during metric calculation, and one during prediction time.
var model = pipeline.Fit(data);
var newModel = mlContext.BinaryClassification.ChangeThreshold(model, -5f);
var engine = model.CreatePredictionEngine<TweetSentiment, Answer>(mlContext);
var newEngine = newModel.CreatePredictionEngine<TweetSentiment, Answer>(mlContext);
var pr = engine.Predict(new TweetSentiment() { SentimentText = "Good Bad job" });
Assert.True(pr.PredictedLabel);
pr = newEngine.Predict(new TweetSentiment() { SentimentText = "Good Bad job" });
Assert.False(pr.PredictedLabel);
// Evaluate the model.
var scoredData = model.Transform(data);
var metrics = mlContext.BinaryClassification.Evaluate(scoredData);
var newScoredData = newModel.Transform(data);
var newMetrics = mlContext.BinaryClassification.Evaluate(scoredData);
Assert.True(metrics, newMetrics);
I'm making changes which allow user change threshold, but only during prediction time, during metric calculation they are remain same, because we have Scorer and we have Evaluator and they don't respect each other. Shall they? @rogancarr @TomFinley
Also Evaluator thing let user specify things like:
[Argument(ArgumentType.AtMostOnce, HelpText = "Probability value for classification thresholding")]
public Single Threshold;
[Argument(ArgumentType.AtMostOnce, HelpText = "Use raw score value instead of probability for classification thresholding", ShortName = "useRawScore")]
public bool UseRawScoreThreshold = true;
[Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for p/r curve generation. Specify 0 for no p/r curve generation", ShortName = "numpr")]
public int NumRocExamples;
[Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUC calculation. If 0, AUC is not computed. If -1, the whole dataset is used", ShortName = "numauc")]
public int MaxAucExamples = -1;
[Argument(ArgumentType.AtMostOnce, HelpText = "The number of samples to use for AUPRC calculation. Specify 0 for no AUPRC calculation", ShortName = "numauprc")]
public int NumAuPrcExamples = 100000;
and we expose nothing in our evaluate method.
I think it makes sense to keep them different.
Setting the threshold on the predictor actually changes the pipeline, whereas changing the evaluator lets you ask "what if" scenarios.
Plus the Evaluator needs to threshold for AUC and AUPRC anyways, so we (probably) won't be able to toss that code.
I don't hold this opinion very strongly, though.
Technically, this issue started by stating that we should be able to set the Threshold
and ThresholdColumn
properties of the BinaryPredictionTransformer
. That said, I think that the actual use cases that we care about are changing the Threshold; the ThresholdColumn
seems more important when we are creating a new BinaryPredictionTransformer
. I actually don't think we need to modify that property. This issue can be closed simply by allowing a Threshold
to be set.
Note that in practice, we have a Score
column and a Probability
column from our learners; modulo floating point error, these are a 1:1 mapping, so we will get the same results no matter which one we threshold on. Setting a custom threshold column seems more like a backdoor for crazy things: e.g. creating a custom score based on a few models and / or heuristics and then modifying a BinaryPredictionTransformer
to score that column instead.
It is no longer possible to set a custom
Threshold
andThresholdColumn
on a binary classifier.Previously, we had been using
BinaryPredictionTransformer
. Recently,BinaryPredictionTransformer
was marked asinternal
and is no longer available for usage outside of the library.Related to question #403