dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.93k stars 1.86k forks source link

Smart train memory handling for AutoML (ML.net 3) #6925

Open 80LevelElf opened 6 months ago

80LevelElf commented 6 months ago

Is your feature request related to a problem? Please describe.

Let's see the current training settings:

    var settings = new BinaryExperimentSettings
    {
        MaxExperimentTimeInSeconds = 30 * 60,
        MaxModels = 10,
        MaximumMemoryUsageInMegaByte = 7500,
    };

    ExperimentResult<BinaryClassificationMetrics> experimentResult = experiment
        .Execute(trainDataView, nameof(MlModelRow.Label), nameof(MlModelRow.LearningGroup));

When the training takes more than 7500 Megabytes it will be canceled. But if we not set MaximumMemoryUsageInMegaByte this training take a lot of memory and in many case it will be more that our current pod memory ( > 36 Gb).

And during to logs it's often different amount of data. The very similar train set learning could set 10 Gb at first time and 30 Gb at second time.

Describe the solution you'd like It will be perfect to have memory limitation as max memory ml.net can use for training without canceling. Like we have limitation for 7500 Mb and 1 training takes 2500 Mb so let's start 3 models training.

80LevelElf commented 6 months ago

And looks like MaximumMemoryUsageInMegaByte is not a restriction of train memory usage, but a restriction of whole process memory usage

During to smart training memory limitation (like in vowpal wabbit) - maybe train a little model to predict aprox training memory usage and handle training model count and using trainers because of that?)