dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.02k stars 1.88k forks source link

[Question] Is there a (citable) description of the AutoML subcomponent? #5947

Open PGijsbers opened 3 years ago

PGijsbers commented 3 years ago

MLNet.CLI has been integrated with the automl benchmark (courtesy of @LittleLittleCloud), and we would like to provide a brief description of the system on the webpage and in our paper. Is there any documentation or paper detailing the general process with which MLNet.CLI generates its ML models? A paper would be best, but a clear documentation page would also suffice. Unfortunately, I could only find usage documentation. Sorry if this is not the correct place to ask.

torronen commented 3 years ago

+1 for this documentation. The source is pretty ok to follow , but it would be nice to get documentation for each of the sub-components / folders for a quick start for developers, too.

There might be some performance difference with the class library/API (nuget package) vs. CLI and Model Builder. It could be interesting to see benchmark comparing them.

I might be mistaken but based on the hyperparameters I see I think CLI and Model Builder are using different tuning algorithm than SmacSweeper from the class library, or at least different sweepable ranges and slightly different attributes. Many of the hyperparameters from Model builder are out of range for sweeper in ML.NET class library. Maybe LittleLittleCloud can confirm or disconfirm. It would seem Model builder has a better tuning engine (sweeper?).

LittleLittleCloud commented 3 years ago

Hi @PGijsbers

There's no specific paper describing how mlnet.cli tunning its model but it mostly uses technology from flaml, which includes CFO tunning, ECI for model selection and subsampling for large datasets. The architecture and search space are likely to have difference though considering that it's using ML.Net as ML framework.

We can consider creating a seperate document for mlnet.cli's architecture and search space, @JakeRadMSFT @luisquintanilla Any thought on this?

PGijsbers commented 3 years ago

FWIW I would value such documentation that describes the differences, even if the search procedure is (largely) just FLAML.

luisquintanilla commented 1 year ago

Not sure if this addresses the question, but here is a doc that was recently published on the topic.

https://learn.microsoft.com/en-us/dotnet/machine-learning/automated-machine-learning-mlnet#automl-in-mlnet