Open gbanuk2 opened 1 year ago
How large is your dataset, AutoML was designed to not canceling trial if there's no completed trial even after times out. So if your dataset is large, it might take a lot of time to finish the first run.
We also change AutoML canceling behavior to respect max training time budget in nightly build after MaxModelToExplored
option being added. So you can also update to the latest nightly build ML.Net, which should cancel trial after time-budget is used.
The dataset is a 66,000 line csv. Two columns of data totalling 5M. 10 seconds doesn't result in a a good result but it's enough to see if the program is working. I tried setting the TrainingTime to 30 seconds and it still doesn't return after 4 minutes.
The command line mlnet tool comes back after 10 seconds.
mlnet classification --dataset "Material.csv" --label-col 1 --name "Material" --train-time 10
We've been using this to train the data in the meantime.
mlnet cli uses some other techniques to help the first trial end quickly like starting from a subset of dataset. This is probably why you see mlnet ends in 10 seconds.
To accelerate your training, you can also disable 'sdca' and 'lbfgs'. Those trainers can be slower than expected in some situation. However, FastTree
, FastForest
and LightGbm
usually finish quickly if the given hyper-parameter is small.
@LittleLittleCloud does it make sense to document this and provide the optimizations that the CLI performs to improve training time?
System Information (please complete the following information):
Describe the bug No matter what I set for SetTrainingTimeInSeconds, running the experiment never returns. I'm using the example from: https://learn.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/how-to-use-the-automl-api
To Reproduce