Open JacksonKearl opened 4 years ago
Here for instance, the best runs were all towards the bottom of the experiment, but I have no clue how to reproduce them without running the whole thing over again.
@justinormont can you comment on this? Good to hear from you, Jackson. Usually, I would expect the training code for the best model to be generated in your output. Is that not happening for you?
@gvashishtha hey Gopal,
When I run from the CLI code is generated, but I believe that also doesn’t include the hyperparameters. However this is running from the API, which does not generate code. I need to run from API to customize the runner in ways the CLI does not allow.
Either way it’d be nice for the logs to include information on how to reproduce experiments regardless of if that experiment was the “best”. If it’s able to find a few good tunings I’d like to see how to reproduce all of them.
@JacksonKearl : When the CodeGen runs from the CLI or Model Builder, the best pipeline is created in C# code including the hyperparameters.
The case where you do not see hyperparameters is when the default hyperparameters is the winning pipeline. Default hyperparameters are implicit and are not listed.
AutoML sweeping strategy:
It's likely you need more runtime to see non-default hyperparameters, as the defaults are honed to work well on most dataset and takes some iterations beat it on any specific dataset. Aim for ~150 pipelines tried so the Bayesian style has enough iterations to converge.
Current state: The CodeGen only produces the one winning pipeline. The AutoML log file shows the MAML command for all pipelines tried.
There's no general method for serializing a ML․NET pipeline. The closest is a MAML command, EntryPoints API, or the C# estimator API.
Options for showing pipeline configurations:
@JacksonKearl : What features from the API would have to be added to the CLI to support your use case?
Currently when I run an AutoML experiment either from the CLI or manually, I get a log of the runs and the best run, but I can't seem to access the actual parameters (maxIterations, regularizations, etc.) that it used for that run. This makes it very difficult to reproduce single well performing runs.
Is there any way to get the hyperparameter config for the runs AutoML is queuing? Could this be added?