Releases: Last Release - 0.14 (19th September 2021), First Release - (17th October 2017)
Language: Python 3.7 - 3.9
Data Types: Structured data. We should pass to the object something like X_train, y_train. But in other side, we can preprocess unstructured data (such as images) and convert it into some kind of structured data/vectors.
Execution backends: joblib, Dask
Distributed framework: Dask
Distributed algorithms libraries: None. It is using sklearn under the hood.
Elements of pipeline are covered in distributed mode: N/A
Is it possible to mix distributed and non-distributed backends?: Within one pipeline - no.
How is the pipeline converted to computing applications?: The pipeline may be distributed or local. I'm not sure if the entire pipeline is distributed. Probably some steps are completed only by client in the local mode.
Resource managers: TPOT is able to use resource managers from Dask - YARN, Kubernetes, Slurm, LSF, SGE and other (https://docs.dask.org/en/latest/how-to/deploy-dask/hpc.html). There is also an ability to deploy cluster in cloud solutions, such as AWS, GCP, Azure.
https://github.com/automl/auto-sklearn
GitHub: Stars - 5.8k, Forks - 1.1k, Contributors - 68
Releases: Last Release - 0.14 (19th September 2021), First Release - (17th October 2017)
Language: Python 3.7 - 3.9
Data Types: Structured data. We should pass to the object something like X_train, y_train. But in other side, we can preprocess unstructured data (such as images) and convert it into some kind of structured data/vectors.
Execution backends: joblib, Dask
Distributed framework: Dask
Distributed algorithms libraries: None. It is using
sklearn
under the hood.AutoML pipeline elements:
Elements of pipeline are covered in distributed mode: N/A
Is it possible to mix distributed and non-distributed backends?: Within one pipeline - no.
How is the pipeline converted to computing applications?: The pipeline may be distributed or local. I'm not sure if the entire pipeline is distributed. Probably some steps are completed only by client in the local mode.
Approach to ML models selection computing: Multiple distributed models (https://ml.dask.org/hyper-parameter-search.html)
Data processing steps pruning: There is an ability to avoid repeating work using Dask features (https://ml.dask.org/hyper-parameter-search.html#avoid-repeated-work)
Resource managers: TPOT is able to use resource managers from Dask - YARN, Kubernetes, Slurm, LSF, SGE and other (https://docs.dask.org/en/latest/how-to/deploy-dask/hpc.html). There is also an ability to deploy cluster in cloud solutions, such as AWS, GCP, Azure.
Hyperparameter tuner: Dask hyperparameter search (https://ml.dask.org/hyper-parameter-search.html), sklearn
Task types: Classification, Regression
Tests: local mode, distributed mode.
Comments: https://www.automl.org/