Ability to limit AutoML resource using (amount of parallel threads)

dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.

https://dot.net/ml

MIT License

8.94k stars 1.86k forks source link

Ability to limit AutoML resource using (amount of parallel threads) #6061

Open 80LevelElf opened 2 years ago

80LevelElf commented 2 years ago

The current AutoML is not good to use when you have trained a lot of models at the same time in the cloud. If you have some amount of pods in your Kubernetes cluster it doesn't matter how many AutoML experiments you execute at the same time. 4 experiments or 1 experiment at the same time use 100% of CPU (It brokes health checks and so on)

Low-level API of trainers (like FastForestBinaryTrainer) has options like NumberOfThreads and some other trainer-specific options you can use to handle the workload.

Is it able to add something like this to AutoML API?

But the most brilliant solution is some sort of smart property like ResourceUsingRatio which can be from 0.0 to 1.0 ResourceUsingRatio = 1.0 means the experiment use the maximum of potential resources (mainly CPU) it needs or the machine has.

michaelgsharp commented 2 years ago

@80LevelElf thanks for your suggestion, this is something that would be good to address. We are currently in the process of updating our AutoML implementation, so this is something that we can look at during this time.

@JakeRadMSFT @LittleLittleCloud thoughts?

LittleLittleCloud commented 2 years ago

In model builder and mlnet cli, users can limit on the number of thread one experiment can use via env:MLNET_MAX_THREAD, we could probably do the same via AutoML experiment setting

michaelgsharp commented 2 years ago

@LittleLittleCloud will that environment variable work with AutoML just as is? As a potential workaround for now:?

LittleLittleCloud commented 2 years ago

Nope,, it's only available in modelbuilder/cli

michaelgsharp commented 2 years ago

Yeah, I just checked and the main ML.NET framework itself doesn't know anything about that..

With the re-work going on with AutoML, is this something that could be included pretty easily with it?

LittleLittleCloud commented 2 years ago

Yup, just need to expose it through AutoML experiment setting and override the default value when creating trainer