Open raymond130 opened 1 year ago
please check the distinct values of PrMaintenanceClass.failure in preparedData
@raymond130 I got this to work (using the IrisData set, essentially just a modification of the MulticlassClassification_Iris
sample using your code)
private static void BuildTrainEvaluateAndSaveModelOneHot(MLContext mlContext)
{
var trainingDataView = mlContext.Data.LoadFromTextFile<IrisData>(TrainDataPath, hasHeader: true);
var testDataView = mlContext.Data.LoadFromTextFile<IrisData>(TestDataPath, hasHeader: true);
var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "KeyColumn", inputColumnName: nameof(IrisData.Label))
.Append(mlContext.Transforms.Categorical.OneHotEncoding("KeyColumn", outputKind: OneHotEncodingEstimator.OutputKind.Key))
.Append(mlContext.Transforms.Concatenate("Features", nameof(IrisData.SepalLength), nameof(IrisData.SepalWidth), nameof(IrisData.PetalLength), nameof(IrisData.PetalWidth))
.AppendCacheCheckpoint(mlContext));
//settings hyper parameters
var TrainerOptions = new LightGbmMulticlassTrainer.Options();
TrainerOptions.FeatureColumnName = "Features";
TrainerOptions.LabelColumnName = "KeyColumn";
TrainerOptions.LearningRate = 0.005;
TrainerOptions.NumberOfLeaves = 70;
TrainerOptions.NumberOfIterations = 2000;
TrainerOptions.NumberOfLeaves = 50;
TrainerOptions.UnbalancedSets = true;
TrainerOptions.Sigmoid = 0.2;
var boost = new DartBooster.Options();
boost.XgboostDartMode = true;
boost.MaximumTreeDepth = 25;
TrainerOptions.Booster = boost;
// Define LightGbm algorithm estimator
IEstimator<ITransformer> lightGbm = mlContext.MulticlassClassification.Trainers.LightGbm(TrainerOptions);
var transformationPipeline = dataProcessPipeline.Append(lightGbm);
//train the ML model
TransformerChain<ITransformer> trainedModel = transformationPipeline.Fit(trainingDataView);
// evaluate the model and show accuracy stats
Console.WriteLine("===== Evaluating Model's accuracy with Test data =====");
var predictions = trainedModel.Transform(testDataView);
var metrics = mlContext.MulticlassClassification.Evaluate(predictions, "Label", "Score");
Common.ConsoleHelper.PrintMultiClassClassificationMetrics(lightGbm.ToString(), metrics);
// Save/persist the trained model to a .ZIP file
mlContext.Model.Save(trainedModel, trainingDataView.Schema, ModelPathLightGbm);
Console.WriteLine("The model is saved to {0}", ModelPathLightGbm);
}
Note a couple of things that I had to change from your code: The outputColumnName
of the first Transform had to be the input column name to OneHotEncoding
, and it also needs to match TrainerOptions.LabelColumnName
Hi there - I was able to get this fixed! @feiyun0112 was correct - my dataset had an improperly marked failure class that contained all the same values. @tearlant thank you for the additional help with the proofreading!
Thank you all so much!
System Information (please complete the following information):
Describe the bug this bug occurs when I try to use the transformationpipeline.fit(data) function with the LightGbm trainer, after filling out the appropriate options - I get the error "LightGBM Error, code is -1, error message is 'Number of classes should be specified and greater than 1 for multiclass training'.'"
I tried to locate the source of this error in the source code and I can't figure out how to define the number of classes in the trainer. My label column is one-hot encoded so it should have two classes if I've interpreted the documentation correctly, I'm not sure where the error is coming from.
To Reproduce run the LightGbmMulticlass trainer and try to train with it
Expected behavior Should train properly
Screenshots, Code, Sample Projects Below are two pictures of my code where I define the expected labels and features, and where I pass in the data I use.
Here is the code written out:
for the preparedata method:
`IEstimator dataPipeline =
mlContext.Transforms.Conversion.MapValueToKey
(outputColumnName: "Label", inputColumnName: nameof(PrMaintenanceClass.failure))
//encode model column
.Append(mlContext.Transforms.Categorical.OneHotEncoding
("model", outputKind: OneHotEncodingEstimator.OutputKind.Indicator))
and for the train method:
` var transformationPipeline = PrepareData(mlContext);
Additional context I hope this can help! I feel like I made a simple error