dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

Training never returns no matter what is set for SetTrainingTimeInSeconds #6567

Open gbanuk2 opened 1 year ago

gbanuk2 commented 1 year ago

System Information (please complete the following information):

Describe the bug No matter what I set for SetTrainingTimeInSeconds, running the experiment never returns. I'm using the example from: https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/how-to-use-the-automl-api

To Reproduce

        public static void Train()
        {
            // Initialize MLContext
            MLContext ctx = new MLContext();

            // Define data path
            var dataPath = Path.GetFullPath(@"c:\temp\AutoMl\Material.csv");

            // Infer column information
            ColumnInferenceResults columnInference =
                ctx.Auto().InferColumns(dataPath, labelColumnName: "value",
                    groupColumns: false);

            // Create text loader
            TextLoader loader =
                ctx.Data.CreateTextLoader(columnInference.TextLoaderOptions);

            // Load data into IDataView
            IDataView data = loader.Load(dataPath);

            DataOperationsCatalog.TrainTestData trainValidationData =
                ctx.Data.TrainTestSplit(data, testFraction: 0.2);

            SweepablePipeline pipeline =
                ctx.Auto().Featurizer(data, columnInformation: columnInference.ColumnInformation)
                    .Append(ctx.Auto().Regression(labelColumnName: columnInference.ColumnInformation
                            .LabelColumnName));

            AutoMLExperiment experiment = ctx.Auto().CreateExperiment();

            experiment
                .SetPipeline(pipeline)
                .SetRegressionMetric(RegressionMetric.RSquared, labelColumn: columnInference.ColumnInformation.LabelColumnName)
                .SetTrainingTimeInSeconds(30)
                .SetDataset(trainValidationData);

            // Log experiment trials
            ctx.Log += (_, e) =>
            {
                if (e.Source.Equals("AutoMLExperiment"))
                {
                    Console.WriteLine(e.RawMessage);
                }
            };
            TrialResult experimentResults =  experiment.Run();

        }
LittleLittleCloud commented 1 year ago

How large is your dataset, AutoML was designed to not canceling trial if there's no completed trial even after times out. So if your dataset is large, it might take a lot of time to finish the first run.

We also change AutoML canceling behavior to respect max training time budget in nightly build after MaxModelToExplored option being added. So you can also update to the latest nightly build ML.Net, which should cancel trial after time-budget is used.

gbanuk2 commented 1 year ago

The dataset is a 66,000 line csv. Two columns of data totalling 5M. 10 seconds doesn't result in a a good result but it's enough to see if the program is working. I tried setting the TrainingTime to 30 seconds and it still doesn't return after 4 minutes.

The command line mlnet tool comes back after 10 seconds. mlnet classification --dataset "Material.csv" --label-col 1 --name "Material" --train-time 10 We've been using this to train the data in the meantime.

LittleLittleCloud commented 1 year ago

mlnet cli uses some other techniques to help the first trial end quickly like starting from a subset of dataset. This is probably why you see mlnet ends in 10 seconds.

To accelerate your training, you can also disable 'sdca' and 'lbfgs'. Those trainers can be slower than expected in some situation. However, FastTree, FastForest and LightGbm usually finish quickly if the given hyper-parameter is small.

luisquintanilla commented 1 year ago

@LittleLittleCloud does it make sense to document this and provide the optimizations that the CLI performs to improve training time?