I encountered an issue while attempting to retrain a model using the ML.NET framework. The retraining works perfectly when the new data contains existing labels, but it fails with the following error when new labels (not present in the original training data) are introduced:
// Retrain model
var retrainedModel = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(
new LbfgsMaximumEntropyMulticlassTrainer.Options() {
L1Regularization = 0.1195667F,
L2Regularization = 0.03125F,
LabelColumnName = @"col1",
FeatureColumnName = @"Features"
}).Fit(transformedNewData, originalModelParameters);
Error Message
System.InvalidOperationException: 'No valid training instances found, all instances have missing features.'
Steps to Reproduce
Train an initial model using a dataset with a specific set of labels.
Attempt to retrain the model using a new dataset that includes labels not present in the original dataset.
Expected Behavior
The model should be able to retrain successfully even when new labels are introduced in the retraining dataset.
Actual Behavior
The retraining process fails with an InvalidOperationException, stating that there are no valid training instances because all instances have missing features.
Environment
ML.NET version: 3.0.1
.NET version: net8.0
Operating System: Windows 10
Code Sample
public static void ReTrain(string outputModelPath, IEnumerable<ModelInput> newDatas)
{
var mlContext = new MLContext();
// Define DataViewSchema of data prep pipeline and trained model
DataViewSchema dataPrepPipelineSchema, modelSchema;
// Load data preparation pipeline and trained model
var dataPrepPipeline = mlContext.Model.Load("data_preparation_pipeline.zip", out dataPrepPipelineSchema);
var trainedModel = mlContext.Model.Load("ogd_model.zip", out modelSchema);
// Extract trained model parameters
var transformers = (IEnumerable<ITransformer>)trainedModel;
var originalModelParameters = ((MulticlassPredictionTransformer<MaximumEntropyModelParameters>?)transformers.FirstOrDefault(x => x is MulticlassPredictionTransformer<MaximumEntropyModelParameters>))?.Model;
// Load New Data
var newDataView = mlContext.Data.LoadFromEnumerable(newDatas);
// Preprocess Data
var transformedNewData = dataPrepPipeline.Transform(newDataView);
// Retrain model
var retrainedModel = mlContext.MulticlassClassification.Trainers.LbfgsMaximumEntropy(
new LbfgsMaximumEntropyMulticlassTrainer.Options() {
L1Regularization = 0.1195667F,
L2Regularization = 0.03125F,
LabelColumnName = @"col1",
FeatureColumnName = @"Features"
}).Fit(transformedNewData, originalModelParameters);
}
Issue Description
I encountered an issue while attempting to retrain a model using the ML.NET framework. The retraining works perfectly when the new data contains existing labels, but it fails with the following error when new labels (not present in the original training data) are introduced:
Error Message
Steps to Reproduce
Expected Behavior
The model should be able to retrain successfully even when new labels are introduced in the retraining dataset.
Actual Behavior
The retraining process fails with an InvalidOperationException, stating that there are no valid training instances because all instances have missing features.
Environment
Code Sample