dotnet / machinelearning-modelbuilder

Simple UI tool to build custom machine learning models.
Creative Commons Attribution 4.0 International
259 stars 52 forks source link

Inspect AutoML parameter choices #688

Open JacksonKearl opened 4 years ago

JacksonKearl commented 4 years ago

Currently when I run an AutoML experiment either from the CLI or manually, I get a log of the runs and the best run, but I can't seem to access the actual parameters (maxIterations, regularizations, etc.) that it used for that run. This makes it very difficult to reproduce single well performing runs.

Is there any way to get the hyperparameter config for the runs AutoML is queuing? Could this be added?

JacksonKearl commented 4 years ago

Here for instance, the best runs were all towards the bottom of the experiment, but I have no clue how to reproduce them without running the whole thing over again. image

gvashishtha commented 4 years ago

@justinormont can you comment on this? Good to hear from you, Jackson. Usually, I would expect the training code for the best model to be generated in your output. Is that not happening for you?

JacksonKearl commented 4 years ago

@gvashishtha hey Gopal,

When I run from the CLI code is generated, but I believe that also doesn’t include the hyperparameters. However this is running from the API, which does not generate code. I need to run from API to customize the runner in ways the CLI does not allow.

Either way it’d be nice for the logs to include information on how to reproduce experiments regardless of if that experiment was the “best”. If it’s able to find a few good tunings I’d like to see how to reproduce all of them.

justinormont commented 4 years ago

Missing hyperparameters in CodeGen

@JacksonKearl : When the CodeGen runs from the CLI or Model Builder, the best pipeline is created in C# code including the hyperparameters.

The case where you do not see hyperparameters is when the default hyperparameters is the winning pipeline. Default hyperparameters are implicit and are not listed.

AutoML sweeping strategy:

It's likely you need more runtime to see non-default hyperparameters, as the defaults are honed to work well on most dataset and takes some iterations beat it on any specific dataset. Aim for ~150 pipelines tried so the Bayesian style has enough iterations to converge.

Improving the display of all pipelines tried

Current state: The CodeGen only produces the one winning pipeline. The AutoML log file shows the MAML command for all pipelines tried.

There's no general method for serializing a ML․NET pipeline. The closest is a MAML command, EntryPoints API, or the C# estimator API.

Options for showing pipeline configurations:

  1. The MAML command style is printed in the AutoML log file (current state)
  2. We could return AutoML's internal pipeline object, though it wouldn't make sense to most people. The internal representation was not meant to be read by a person. (easiest to implement)
  3. We could convert the pipeline object to EntryPoints API, though this would mostly be useful only to NimbusML users. (wouldn't recommend)
  4. We could run GodeGen on all pipelines tried (or allow the user to choose which to run on). I'm unsure if the CodeGen APIs are exposed currently; if I recall the CLI and Model Builder access internal APIs to call CodeGen. (likely best method)

CLI improvements

@JacksonKearl : What features from the API would have to be added to the CLI to support your use case?