dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.92k stars 1.86k forks source link

ML.NET can't add Evaluate logic into pipeline #7130

Closed muhamedkarajic closed 2 months ago

muhamedkarajic commented 2 months ago

System Information (please complete the following information):

Describe the bug I want to split the training and test set and evaluate the model. Therefor I have created a function:

private static void EvaluateModel(MLContext mlContext, ITransformer trainedModel, IDataView testData)
{
    var predictedData = trainedModel.Transform(testData);

    var metrics = mlContext.BinaryClassification.EvaluateNonCalibrated(predictedData, "target", "Score", "PredictedLabel");

    Console.WriteLine($"Accurecy: {metrics.Accuracy: 0.###}");
    Console.WriteLine($"---------------------------");
    Console.WriteLine($"Confusion Matrix");
    Console.WriteLine(metrics.ConfusionMatrix.GetFormattedConfusionTable());
    Console.WriteLine();
}

To Reproduce Steps to reproduce the behavior:

  1. Run mlnet classification --dataset "./FILE_PATH/FILE_NAME.csv" --label-col 11 --has-header true --train-time 60
  2. Add to SampleClassification.training.cs the function EvaluateModel
  3. Adjust the already existing Train function by
  4. run dotnet run
  5. Error: An unhandled exception of type 'System.ArgumentOutOfRangeException' occurred in Microsoft.ML.Data.dll: 'Schema mismatch for score column 'Score': expected Single, got Vector<Single, 2>'

Expected behavior I would expect the code to work since the Score is something which ML.NET creates. It seems like it expects the score to be a sinle value while its a compex vector.

Screenshots, Code, Sample Projects

Here is the adjusted train function:

public static void Train(string outputModelPath, string inputDataFilePath = RetrainFilePath, char separatorChar = RetrainSeparatorChar, bool hasHeader = RetrainHasHeader)
{
    var mlContext = new MLContext();

    var data = LoadIDataViewFromFile(mlContext, inputDataFilePath, separatorChar, hasHeader);
    var splitedData = mlContext.Data.TrainTestSplit(data, 0.2, null, 0);
    var model = RetrainModel(mlContext, splitedData.TrainSet);
    EvaluateModel(mlContext, model, data);
    SaveModel(mlContext, model, data, outputModelPath);
}

Additional context I ahve found that I am supposed to useEvaluateNonCalibrated instead of Evaluate. Have similar when using Evaluate its says that its missing Predictions. Error in that case Probability column 'Probability' not found (Parameter 'schema').

muhamedkarajic commented 2 months ago

The issue was the following code which was generated by ML.NET:

mlContext.MulticlassClassification.Trainers.OneVersusAll(binaryEstimator:mlContext.BinaryClassification.Trainers.FastTree(...)"}),labelColumnName: @"target");

I assumed this is outputting a BinaryClassification while its actually MulticlassClassification. I have quickly solved it by doing:

var metrics = mlContext.MulticlassClassification.Evaluate(predictedData, "target", "Score", "PredictedLabel");