Ennosigaeon / xautoml

XAutoML: A Visual Analytics Tool for Understanding and Validating Automated Machine Learning
BSD 3-Clause "New" or "Revised" License
32 stars 7 forks source link

Compatibility with H2O-3 #3

Open FavioVazquez opened 2 years ago

FavioVazquez commented 2 years ago

Hi guys! Amazing job you did with this package. I work at H2O.ai and I’d like to know how can I help to make this compatible with our open source AutoML solution.

Let me know how can we get started helping you with this :)

Ennosigaeon commented 2 years ago

Thanks for the positive feedback!

To integrate H20-3 basically three different information are necessary:

  1. You would have to provide a set of evaluated configurations/pipelines over time that are supposed to be visualized.
  2. If additional insights about specific configurations should be displayed, in addition access to the fitted models is necessary for some on-the-fly predictions.
  3. Access to the train/test data set

The logic for integration frameworks is implemented in the adapter package. These adapters are responsible for converting an arbitrary object to a RunHistory.

I will try to provide a base implementation for H2O next week and will come back to you if I need help with extracting the required information from H2O.

FavioVazquez commented 2 years ago

Thanks! Please let me know how can I help, we can even setup a zoom call, we are very interested :)

Ennosigaeon commented 2 years ago

@FavioVazquez I have prepared a first draft for H2O (see the H2O example).

I currently have three points I am a bit struggling with related to the underlying search space:

  1. Is there some way to get an overview of all available models that are going to be evaluated during the Grid/Random search?
  2. Is there a generic way to obtain the available hyperparameters of each estimator? For example the GBM class mixes actual hyperparameters (like ntrees or max_depth) with "meta-parameters" like training_frame.
  3. Do you perform any kind of preprocessing that should be displayed in the pipeline overview?
FavioVazquez commented 2 years ago

Hi @Ennosigaeon sorry for the delay. I'm working with the development team to answer all of your questions. Do you need something else from our side?

FavioVazquez commented 2 years ago

@Ennosigaeon here are the answers:

  1. Grid/Random Search needs to be defined by the user so they will have to explicitly ask for which algorithm they would like to tune. But I think we don't have a list of options in our python api docs
  2. You can see all the meta-parameters that will be tuned by looking at the function, there will be a list of the parameters. If you are not sure what parameters you should tune it might be helpful for them to read over what AutoML tunes: https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html#random-grid-search-parameters or use AutoML instead which will do the automatic grid search for each of the common algorithms
  3. No we don't.
FavioVazquez commented 2 years ago

Btw we don't do data pre-processing in the grid search but you can use h2o-3 for data munging etc. we have functions for that @Ennosigaeon