dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

LightGBM Multiclassification trainer returning error code -1 "Number of classes should be specified and greater than 1 for multiclass training" but I can't see where to specify the number of classes #6844

Open raymond130 opened 9 months ago

raymond130 commented 9 months ago

System Information (please complete the following information):

I tried to locate the source of this error in the source code and I can't figure out how to define the number of classes in the trainer. My label column is one-hot encoded so it should have two classes if I've interpreted the documentation correctly, I'm not sure where the error is coming from.

To Reproduce run the LightGbmMulticlass trainer and try to train with it

Expected behavior Should train properly

Screenshots, Code, Sample Projects Below are two pictures of my code where I define the expected labels and features, and where I pass in the data I use. image image

Here is the code written out:

for the preparedata method:

`IEstimator dataPipeline = mlContext.Transforms.Conversion.MapValueToKey (outputColumnName: "Label", inputColumnName: nameof(PrMaintenanceClass.failure)) //encode model column .Append(mlContext.Transforms.Categorical.OneHotEncoding ("model", outputKind: OneHotEncodingEstimator.OutputKind.Indicator))

        //define features column
        .Append(mlContext.Transforms.Concatenate("Features",
        // 
        nameof(PrMaintenanceClass.voltmean_3hrs), nameof(PrMaintenanceClass.rotatemean_3hrs),
        nameof(PrMaintenanceClass.pressuremean_3hrs), nameof(PrMaintenanceClass.vibrationmean_3hrs),
        nameof(PrMaintenanceClass.voltstd_3hrs), nameof(PrMaintenanceClass.rotatestd_3hrs),
        nameof(PrMaintenanceClass.pressurestd_3hrs), nameof(PrMaintenanceClass.vibrationstd_3hrs),
        nameof(PrMaintenanceClass.voltmean_24hrs), nameof(PrMaintenanceClass.rotatemean_24hrs),
        nameof(PrMaintenanceClass.pressuremean_24hrs),
        nameof(PrMaintenanceClass.vibrationmean_24hrs),
        nameof(PrMaintenanceClass.voltstd_24hrs), nameof(PrMaintenanceClass.rotatestd_24hrs),
        nameof(PrMaintenanceClass.pressurestd_24hrs), nameof(PrMaintenanceClass.vibrationstd_24hrs),
        nameof(PrMaintenanceClass.error1count), nameof(PrMaintenanceClass.error2count),
        nameof(PrMaintenanceClass.error3count), nameof(PrMaintenanceClass.error4count),
        nameof(PrMaintenanceClass.error5count), nameof(PrMaintenanceClass.sincelastcomp1),
        nameof(PrMaintenanceClass.sincelastcomp2), nameof(PrMaintenanceClass.sincelastcomp3),
        nameof(PrMaintenanceClass.sincelastcomp4),
        nameof(PrMaintenanceClass.model), nameof(PrMaintenanceClass.age)));

        return dataPipeline;`

and for the train method:

` var transformationPipeline = PrepareData(mlContext);

        //settings hyper parameters
        TrainerOptions = new LightGbmMulticlassTrainer.Options();
        TrainerOptions.FeatureColumnName = "Features";
        TrainerOptions.LabelColumnName = "Label";
        TrainerOptions.LearningRate = 0.005;
        TrainerOptions.NumberOfLeaves = 70;
        TrainerOptions.NumberOfIterations = 2000;
        TrainerOptions.NumberOfLeaves = 50;
        TrainerOptions.UnbalancedSets = true;
        TrainerOptions.Sigmoid = 0.2;
        //
        var boost = new DartBooster.Options();
        boost.XgboostDartMode = true;
        boost.MaximumTreeDepth = 25;
        TrainerOptions.Booster = boost;

        // Define LightGbm algorithm estimator
        IEstimator<ITransformer> lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm(TrainerOptions);

        //train the ML model
        TransformerChain<ITransformer> model = transformationPipeline.Append(lightGbm).Fit(preparedData);

        //return trained model for evaluation
        return model;`

Additional context I hope this can help! I feel like I made a simple error

feiyun0112 commented 9 months ago

please check the distinct values of PrMaintenanceClass.failure in preparedData

tearlant commented 8 months ago

@raymond130 I got this to work (using the IrisData set, essentially just a modification of the MulticlassClassification_Iris sample using your code)

        private static void BuildTrainEvaluateAndSaveModelOneHot(MLContext mlContext)
        {
            var trainingDataView = mlContext.Data.LoadFromTextFile<IrisData>(TrainDataPath, hasHeader: true);
            var testDataView = mlContext.Data.LoadFromTextFile<IrisData>(TestDataPath, hasHeader: true);

            var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "KeyColumn", inputColumnName: nameof(IrisData.Label))
                .Append(mlContext.Transforms.Categorical.OneHotEncoding("KeyColumn", outputKind: OneHotEncodingEstimator.OutputKind.Key))
                .Append(mlContext.Transforms.Concatenate("Features", nameof(IrisData.SepalLength), nameof(IrisData.SepalWidth), nameof(IrisData.PetalLength), nameof(IrisData.PetalWidth))
                .AppendCacheCheckpoint(mlContext));

            //settings hyper parameters
            var TrainerOptions = new LightGbmMulticlassTrainer.Options();
            TrainerOptions.FeatureColumnName = "Features";
            TrainerOptions.LabelColumnName = "KeyColumn";
            TrainerOptions.LearningRate = 0.005;
            TrainerOptions.NumberOfLeaves = 70;
            TrainerOptions.NumberOfIterations = 2000;
            TrainerOptions.NumberOfLeaves = 50;
            TrainerOptions.UnbalancedSets = true;
            TrainerOptions.Sigmoid = 0.2;

            var boost = new DartBooster.Options();
            boost.XgboostDartMode = true;
            boost.MaximumTreeDepth = 25;
            TrainerOptions.Booster = boost;

            // Define LightGbm algorithm estimator
            IEstimator<ITransformer> lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm(TrainerOptions);
            var transformationPipeline = dataProcessPipeline.Append(lightGbm);

            //train the ML model
            TransformerChain<ITransformer> trainedModel = transformationPipeline.Fit(trainingDataView);

            // evaluate the model and show accuracy stats
            Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");
            var predictions = trainedModel.Transform(testDataView);
            var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "Label", "Score");

            Common.ConsoleHelper.PrintMultiClassClassificationMetrics(lightGbm.ToString(), metrics);

            // Save/persist the trained model to a .ZIP file
            mlContext.Model.Save(trainedModel, trainingDataView.Schema, ModelPathLightGbm);
            Console.WriteLine("The model is saved to {0}", ModelPathLightGbm);
        }

Note a couple of things that I had to change from your code: The outputColumnName of the first Transform had to be the input column name to OneHotEncoding, and it also needs to match TrainerOptions.LabelColumnName

raymond130 commented 5 months ago

Hi there - I was able to get this fixed! @feiyun0112 was correct - my dataset had an improperly marked failure class that contained all the same values. @tearlant thank you for the additional help with the proofreading!

Thank you all so much!