dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.92k stars 1.86k forks source link

Unable to remove SdcaLogisticRegressionOva from AutoML Multiclassification Experiment #7025

Open bettwedder opened 4 months ago

bettwedder commented 4 months ago

System Information (please complete the following information):

Describe the bug When creating an AutoML Multiclassification Experiment, you are unable to remove the trainer "SdcaLogisticRegressionOva".

To Reproduce Steps to reproduce the behavior:

  1. Create a Multiclass experiment settings object
  2. Iterate on settings.Trainers and remove all trainers that are not "LightGbm" or "FastForest"
  3. Create a Multiclass Progress Reporter that will output the TrainerName used.
  4. Use this replace command to remove the currently bugged (3.0.1 and 0.21.1) TrainerName value:
    TrainerName.Replace("Multi", "").Replace("ReplaceMissingValues", "").Replace("Concatenate", "").Replace("Unknown", "").Replace("=>", "");
  5. Run experiment and monitor names.

Expected behavior One of the first three models will include the unremovable trainer.

Screenshots, Code, Sample Projects


               MulticlassExperimentSettings settings = new MulticlassExperimentSettings()
                {
                    OptimizingMetric = optimizeMetric,
                    MaxExperimentTimeInSeconds = experimentTime,
                    CacheDirectoryName = cacheDir,
                    CancellationToken = cts.Token,
                    CacheBeforeTrainer = CacheBeforeTrainer.On

                };

                bool keptLightGBM = false;
                foreach (var trainer in settings.Trainers.ToList())
                {

                    if (!trainer.ToString().ToUpperInvariant().Contains("LIGHTGBM") && !trainer.ToString().ToUpperInvariant().Contains("FASTFOREST"))
                    {
                        settings.Trainers.Remove(trainer);
                        Console.WriteLine("Removed Trainer: " + trainer.ToString());
                    }
                    //else
                    //{
                    //    if (keptLightGBM)
                    //    {
                    //        settings.Trainers.Remove(trainer);
                    //        Console.WriteLine("Removed Extra "LightGbm" Trainer: " + trainer.ToString());
                    //    }
                    //    else
                    //        keptLightGBM = true;
                    //}
                }

                MulticlassClassificationExperiment experiment = context.Auto().CreateMulticlassClassificationExperiment(settings);
                ExperimentResult<MulticlassClassificationMetrics> result;

                result = experiment.Execute(trainData, splitTestData, columnInformation, null, new MulticlassProgressReporter() { labelColumnName = label, CacheDir = cacheDir, ExperimentTime = DateTime.Now });

This code produces this output:

image

Additional context If you only leave one LightGbm as the only trainer, then AutoML uses the "SdcaLogisticRegressionOva" every other time.

The trainer "SdcaLogisticRegressionOva" does not appear in the list after creating a settings object which is supposed to populate the list with all values. Also, if you iterate on list of auto populated trainers, two items appear with the name "LightGbm".

Last, when I peek the definition of Microsoft.ML.AutoML.MulticlassClassificationTrainer, I get this list which also doesn't have "SdcaLogisticRegressionOva" in the list.

// Decompiled with JetBrains decompiler
// Type: Microsoft.ML.AutoML.MulticlassClassificationTrainer
// Assembly: Microsoft.ML.AutoML, Version=1.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51
// MVID: 5D7A79B7-CF20-433B-A534-1ED92C335230
// Assembly location: C:\Users\xxxx\.nuget\packages\microsoft.ml.automl\0.21.1\lib\netstandard2.0\Microsoft.ML.AutoML.dll
// XML documentation location: C:\Users\xxxx\.nuget\packages\microsoft.ml.automl\0.21.1\lib\netstandard2.0\Microsoft.ML.AutoML.xml

#nullable disable
namespace Microsoft.ML.AutoML
{
  /// <summary>
  /// Enumeration of ML.NET multiclass classification trainers used by AutoML.
  /// </summary>
  public enum MulticlassClassificationTrainer
  {
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.FastTree.FastForestBinaryTrainer" />.
    /// </summary>
    FastForestOva,
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.FastTree.FastTreeBinaryTrainer" />.
    /// </summary>
    FastTreeOva,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.LightGbm.LightGbmMulticlassTrainer" />.
    /// </summary>
    LightGbm,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.LbfgsMaximumEntropyMulticlassTrainer" />.
    /// </summary>
    LbfgsMaximumEntropy,
    /// <summary>
    /// <see cref="T:Microsoft.ML.Trainers.OneVersusAllTrainer" /> using <see cref="T:Microsoft.ML.Trainers.LbfgsLogisticRegressionBinaryTrainer" />.
    /// </summary>
    LbfgsLogisticRegressionOva,
    /// <summary>
    /// See <see cref="T:Microsoft.ML.Trainers.SdcaMaximumEntropyMulticlassTrainer" />.
    /// </summary>
    SdcaMaximumEntropy,
  }
}