Consider limiting the search range of tree parameters to save system memory

torronen commented 2 years ago

It seems Model Builder has better search range for hyperparameters than defaults in AutoML ML.NET library. However, the trees can grow extremely big which will no longer fit in RAM + swap space of even high-end development machines. For example, latest generation Ryzen's are limited to 128 GB RAM and heavy swap usage makes it very slow.

Maybe limiting the tree size in Model Builder might be suitable for the general developer audience. My experience is that the Model Builder might require about 600 Gb of total working memory with the current settings. It is only required on some datasets so the detrimental effects of a smaller search range would be limited. On other hand, even if in rare cases the biggest possible tree would be ideal, the user experience would still remain better even if the ideal tree size would not achieved.

If possible, it might be useful to provide the current search space / "sweepable range" / "pipeline suggester" as part of ML.NET library. They would be useful for running on server machines with more system resources. Otherwise, it seems to require making a custom build of ML.net library to adjust the search range for AutoML.

(At the moment, an experiment is taking 375 GB on my machine and I am considering if I should let it run over the night with the risk of encountering an exception as it is close to my computer's limit, or if I should stop it now.)

torronen commented 2 years ago

Maybe as part of planning item 1591 ? If so, how about allowing user to set parameter search range? Something like: low ( < 8gb ), medium (<32 gb), high (<128 gb), full (<600 gb - 1 TB) ? Memory limits approximate, of course, as it is not a big deal if it is paging sometimes, but paging multiple times the size of RAM takes ages (in my experiments, multiple days)

But, I also would not mind if you would reduce the limits in the next version already: I decided to let the model run. It is now probably runnig for 48+ hours. The worse part is that I just need to hope that the next iteration is not even bigger as that would cause the run to terminate on an exception. (I can get the zip file from temp folder, but I think this is currently undocumented way to recover so not ideal for many users)

JakeRadMSFT commented 2 years ago

Hey @torronen! I'm not answering this specific question I'll leave that to @LittleLittleCloud ... but ... would you be interested in meeting up with us to share your experiences with ML.NET?

You've provided us some amazing issues but I'd like to zoom out and learn your flow and what you're trying to do!

LittleLittleCloud commented 2 years ago

Unfortunately there's no way to modify searching space in current model builder but we can definitely change that by putting search space in .mbconfig file so that it can be modified.

JakeRadMSFT commented 2 years ago

@LittleLittleCloud thoughts on limiting ours so it doesn't get too big and error out?

torronen commented 2 years ago

@JakeRadMSFT Sure, anytime, antti.torronen@kwork.fi. Actually, we had a call with part of the team a few months back. Since then there are no updates on usage of Model Builder. I am trying to find out the best ways of using ML.NET in our next projects. Currently, I am planning a semi-automated workflow for data analysis and model training. Main finding might be that ML.NET class library could use more extension points, now I have simply edited the original code. PoC code is messy for now but after some work and testing I might be able to validate some ideas for further development. It maybe best to wait until I have a chance to try these in a real project. So far, I have been working on things like simultaneous multidevice AutoML training (=share experiment results and parameters), separate model selection and optimization metric, custom selection metric / criteria (such as minimum 20% recall, max precision for sales leads analysis). Another feedback for AutoML of ML.NET Class library is that it is not as good as Model Builder's. Search range could be bigger + Model Builder would seem to tune more parameters. With bigger search range, the random contenders may become very slow. I have not solved this issue, I am currently running 2 training clients: one for low-end machines, another for servers with more RAM. Next, I probably need to have a look at customizing transformers and featurizer so we would not need to create training data sets separately and could give raw data for models. (+1 for customized featurizers in item 1591)

torronen commented 2 years ago

Visual studio 2022, latest ML.NET Extension 16.7.6.215270, 60GB dataset. This is the biggest memory usage I've seen so far in Model Builder. CPU at around 9%, as it is reading from the paging file. Iteration no. 40. 2600 columns, so it is possible it is doing extreme overfitting with the trees.

I think another way to solve it would be to allow disabling some of the trainers (such as all trees). The user would just need to see what trainer is currently running.

dotnet / machinelearning-modelbuilder

Consider limiting the search range of tree parameters to save system memory #1875