dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.04k stars 1.88k forks source link

How to cancel an Experiment. I don't get that to work. #7290

Open jackpotcityco opened 1 week ago

jackpotcityco commented 1 week ago

Hello,

RAM: 64 GB CPU: i7-12800h

I am trying to understand how to Cancel an experiment but just can't make it work. I have tried both below lines (One at a time, not both at the same time) when: if (modelsLISTtemp.Count >= 3)

MLContextExtensions.CancelExecution(mlContext); //Cancel experiment using "Microsoft.ML.Experimental cancellationTokenSource.Cancel(); // Cancel the task running the experiment using: CancellationToken = cts

  1. The first is using .CancelExecution from Microsoft.ML.Experimental but the experiment never stops and continues to the end which is more than 4 minutes after the .CancelExecution attempt.

  2. The second is trying the CancellationToken = cts but exactly as in above it never stops and continues to the end which is more than 4 minutes after the cancellationTokenSource.Cancel(); attempt.

In both, the experiment continues to return 10,20,30+ models during those 4 minutes.

How would it be possible to actually cancel the Experiment on demand to not be stuck with the 300 seconds the experiment runs? (MaxExperimentTimeInSeconds = (uint)300)

_trainData and hold_outdata are populated with data for training when passed to function: testExperiment

        void testExperiment(IDataView trainData, IDataView hold_out_data)
        {
            Random random = new Random();
            var mlContext = new MLContext(seed: random.Next());
            var cancellationTokenSource = new CancellationTokenSource();
            var cts = new CancellationToken();
            object obj = new object();
            var modelsLISTtemp = new List<(string trainerName, object validationMetrics, ITransformer model)>();

            ExperimentBase<RegressionMetrics, RegressionExperimentSettings> regression_Experiment = null;
            regression_Experiment = mlContext.Auto().CreateRegressionExperiment(new RegressionExperimentSettings
            {
                MaxExperimentTimeInSeconds = (uint)300,
                CacheBeforeTrainer = CacheBeforeTrainer.Off,
                CacheDirectoryName = "C:/Aintelligence/temp/cache",
                MaximumMemoryUsageInMegaByte = 16384,
                OptimizingMetric = RegressionMetric.RSquared,
                CancellationToken = cts
            });

            // Progress handler for regression
            var regressionProgressHandler = new Progress<RunDetail<RegressionMetrics>>(ph =>
            {
                if (ph.ValidationMetrics != null && !ph.TrainerName.Contains("FastForest")) { progress(Math.Round(ph.ValidationMetrics.RSquared, 3), ph.TrainerName, ph.ValidationMetrics, ph.Model); }
            });
            void progress(double metricValue, string TrainerName, object ValidationMetrics, ITransformer Model)
            {
                    lock (obj) { modelsLISTtemp.Add((TrainerName, ValidationMetrics, Model)); }
                    if (modelsLISTtemp.Count >= 3)
                    {
                        MLContextExtensions.CancelExecution(mlContext); //Cancel experiment using "Microsoft.ML.Experimental
                        cancellationTokenSource.Cancel();               // Cancel the task running the experiment using: CancellationToken = cts 
                    }
            }

            //Execute experiment
            var results = regression_Experiment.Execute(trainData, hold_out_data, labelColumnName: "Label", progressHandler: regressionProgressHandler);

        }
LittleLittleCloud commented 1 week ago

Which MLNet version are you using.

You can pass a cancellation token/set maximum training time/set maximum models to explores in AutoML.

For cancelling using cancellation token, you can refer to this test sample. You can also find other examples about cancelling under that test class as well.

Note that the current running trial can't be interrupted by cancellation unless it's implemented in managed code. For example, LightGBM trainer can't be interrupted because it's implemented in native code.