dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.05k stars 1.89k forks source link

LightGBM is producing different multiclass scores after loading saved model #4051

Closed robinmohseni closed 5 years ago

robinmohseni commented 5 years ago

System information

Windows 10 Microsoft.ML (1.2.0) Microsoft.ML.LightGbm (1.2.0) .NET Core 2.2

Issue

After training a lightgbm model, the model is producing multiclass scores between [0, 1] which totals 1, as expected.

However, after saving the model, then loading it into a new trainedModel object - the scores are now not probabilities, but decimal values.

I have tested the saving and loading with other model types and I cannot replicate the results. It is only the case with the lightgbm model.

Please advise. I am now attempting to rollback library versions to see if it's still an issue

Source code / logs

Before saving model... 0.003305528 0.01293249 0.01907223 0.9646355 5.421485E-05 3.556848E-08

After saving model... -3.623514 -2.259367 -1.870877 2.05264 -7.733911 -15.06316

image

Source code

mlContext.Model.Save(trainedModel, dataView.Schema, _modelPath);

        // Save Data Prep transformer
        //mlContext.Model.Save(pipeline, dataView.Schema, "data_preparation_pipeline.zip");

        schema = dataView.Schema;

        Console.WriteLine("Before saving model...");
        TestModelOutput(mlContext, trainedModel);

        // Load trained model
        trainedModel = mlContext.Model.Load(_modelPath, out schema);
        //trainedModel = mlContext.Model.LoadWithDataLoader()

        Console.WriteLine("After saving model...");
        TestModelOutput(mlContext, trainedModel);

private static void TestModelOutput(MLContext mlContext, ITransformer model) { IDataView batchData = mlContext.Data.LoadFromEnumerable(testActions);

        IDataView predictions = model.Transform(batchData);

        IEnumerable<PredictionData> predictedResults = mlContext.Data
            .CreateEnumerable<PredictionData>(predictions, reuseRowObject: false);

        foreach (var item in predictedResults)
        {
            foreach (var score in item.Score)
            {
                Console.WriteLine(score);
            }

        }

}

robinmohseni commented 5 years ago

https://github.com/dotnet/machinelearning/issues/3647

robinmohseni commented 5 years ago

If you do the softmax transformation (exp(x)/sum(exp(x)) then I can replicate the desired results. obviously not the ideal workaround.