Closed fonhorst closed 2 years ago
GitHub: Stars - 161, Forks - 35, Contributors - 7
Releases: Last Release - 0.8.1 (28th May 2021), First Release - (7th March 2020)
Language: Scala, Python (pyspark)
Data Types: Structured
Execution backends: Spark
Distributed framework: Spark
Distributed algorithms libraries: SparkML
AutoML pipeline elements:
Elements of pipeline are covered in distributed mode: all
Is it possible to mix distributed and non-distributed backends?: No
How is the pipeline converted to computing applications?: Distributed application
Approach to ML models selection computing: Multiple distributed models ?
Data processing steps pruning: Yes
Resource managers: Spark's resource managers
Hyperparameter tuner: Spark ML
Task types: Classification, Regression
Tests: local mode, distributed mode.
Comments: --
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
https://github.com/databrickslabs/automl-toolkit
1) github stars: 161
2) last release: 0.8.1 (may 26, 2021) first release: 0.7.0 (Feb 23, 2020)
3) Language: Scala 2.12, python (pyspark)
Currently supported models:
"XGBoost" - XGBoost Classifier or XGBoost Regressor
"RandomForest" - Random Forest Classifier or Random Forest Regressor
"GBT" - Gradient Boosted Trees Classifier or Gradient Boosted Trees Regressor
"Trees" - Decision Tree Classifier or Decision Tree Regressor
"LinearRegression" - Linear Regressor
"LogisticRegression" - Logistic Regressor (supports both Binomial and Multinomial)
"MLPC" - Multi-Layer Perceptron Classifier
"SVM" - Linear Support Vector Machines
"LightGBM" (currently suspended, pending library improvements to LightGBM) LightGBM
https://github.com/databrickslabs/automl-toolkit/tree/master/src/main/scala/com/databricks/labs/automl/model