ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

the model_names don't show when the ensemble_config is setted #427

Open ghk829 opened 5 years ago

ghk829 commented 5 years ago

I run this code

import os
os.environ['is_test_suite']="True" # this is writen due to bug for multiprocessing and pickling I issued. #426 
from auto_ml import Predictor
from auto_ml.utils import get_boston_dataset
from auto_ml.utils_models import load_ml_model

# Load data
df_train, df_test = get_boston_dataset()

# Tell auto_ml which column is 'output'
# Also note columns that aren't purely numerical
# Examples include ['nlp', 'date', 'categorical', 'ignore']
#
column_descriptions = {
  'CHAS': 'output'

}

ml_predictor = Predictor(type_of_estimator='classification', column_descriptions=column_descriptions)
ml_predictor.train(df_train,ensemble_config=[{"model_name":"GradientBoostingClassifier"},{"model_name":"RandomForestClassifier"}],perform_feature_selection=True)

the result only show that

ml_predictor.model_names
['GradientBoostingClassifier']
loaiabdalslam commented 5 years ago

pip install -r advanced_requirements.txt Check if that tensorflow or keras are installed in your machine or anaconda env because i ran your code and its run successfully :

Welcome to auto_ml! We're about to go through and make sense of your data using machine learning, and give you a production-ready pipeline to get predictions with.

If you have any issues, or new feature ideas, let us know at http://auto.ml
You are running on version 2.9.10
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}
Running basic data cleaning
Fitting DataFrameVectorizer
Now using the model training_params that you passed in:
{}
After overwriting our defaults with your values, here are the final params that will be used to initialize the model:
{'presort': False, 'learning_rate': 0.1, 'warm_start': True}

********************************************************************************************
About to run GridSearchCV on the pipeline for several models to predict CHAS
Fitting 2 folds for each of 2 candidates, totalling 4 fits
[CV] _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort=False,
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=True) 
[1] random_holdout_set_from_training_data's score is: -0.244
[2] random_holdout_set_from_training_data's score is: -0.239
[3] random_holdout_set_from_training_data's score is: -0.239
[4] random_holdout_set_from_training_data's score is: -0.234
[5] random_holdout_set_from_training_data's score is: -0.237
[6] random_holdout_set_from_training_data's score is: -0.238
[7] random_holdout_set_from_training_data's score is: -0.237
[8] random_holdout_set_from_training_data's score is: -0.235
[9] random_holdout_set_from_training_data's score is: -0.233
[10] random_holdout_set_from_training_data's score is: -0.232
[11] random_holdout_set_from_training_data's score is: -0.232
[12] random_holdout_set_from_training_data's score is: -0.235
[13] random_holdout_set_from_training_data's score is: -0.236
[14] random_holdout_set_from_training_data's score is: -0.235
[15] random_holdout_set_from_training_data's score is: -0.239
[16] random_holdout_set_from_training_data's score is: -0.239
[17] random_holdout_set_from_training_data's score is: -0.239
[18] random_holdout_set_from_training_data's score is: -0.24
[19] random_holdout_set_from_training_data's score is: -0.241
[20] random_holdout_set_from_training_data's score is: -0.241
[21] random_holdout_set_from_training_data's score is: -0.241
[22] random_holdout_set_from_training_data's score is: -0.242
[23] random_holdout_set_from_training_data's score is: -0.241
[24] random_holdout_set_from_training_data's score is: -0.245
[25] random_holdout_set_from_training_data's score is: -0.245
[26] random_holdout_set_from_training_data's score is: -0.244
[27] random_holdout_set_from_training_data's score is: -0.245
[28] random_holdout_set_from_training_data's score is: -0.246
[29] random_holdout_set_from_training_data's score is: -0.245
[30] random_holdout_set_from_training_data's score is: -0.253
[31] random_holdout_set_from_training_data's score is: -0.254
The number of estimators that were the best for this training dataset: 11
The best score on the holdout set: -0.231888160976642
[CV]  _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort=False,
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=True), score=-0.256, total=   0.1s
[CV] _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=31,
                          n_iter_no_change=None, presort=False,
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=True) 
[1] random_holdout_set_from_training_data's score is: -0.229
[2] random_holdout_set_from_training_data's score is: -0.225
[3] random_holdout_set_from_training_data's score is: -0.216
[4] random_holdout_set_from_training_data's score is: -0.215
[5] random_holdout_set_from_training_data's score is: -0.211
[6] random_holdout_set_from_training_data's score is: -0.207
[7] random_holdout_set_from_training_data's score is: -0.206
[8] random_holdout_set_from_training_data's score is: -0.208
[9] random_holdout_set_from_training_data's score is: -0.201
[10] random_holdout_set_from_training_data's score is: -0.2
[11] random_holdout_set_from_training_data's score is: -0.2
[12] random_holdout_set_from_training_data's score is: -0.202
[13] random_holdout_set_from_training_data's score is: -0.197
[14] random_holdout_set_from_training_data's score is: -0.196
[15] random_holdout_set_from_training_data's score is: -0.197
[16] random_holdout_set_from_training_data's score is: -0.198
[17] random_holdout_set_from_training_data's score is: -0.198
[18] random_holdout_set_from_training_data's score is: -0.2
[19] random_holdout_set_from_training_data's score is: -0.201
[20] random_holdout_set_from_training_data's score is: -0.2
[21] random_holdout_set_from_training_data's score is: -0.205
[22] random_holdout_set_from_training_data's score is: -0.206
[23] random_holdout_set_from_training_data's score is: -0.206
[24] random_holdout_set_from_training_data's score is: -0.208
[25] random_holdout_set_from_training_data's score is: -0.205
[26] random_holdout_set_from_training_data's score is: -0.205
[27] random_holdout_set_from_training_data's score is: -0.202
[28] random_holdout_set_from_training_data's score is: -0.202
[29] random_holdout_set_from_training_data's score is: -0.202
[30] random_holdout_set_from_training_data's score is: -0.204
[31] random_holdout_set_from_training_data's score is: -0.205
[32] random_holdout_set_from_training_data's score is: -0.205
[33] random_holdout_set_from_training_data's score is: -0.205
[34] random_holdout_set_from_training_data's score is: -0.206
The number of estimators that were the best for this training dataset: 14
The best score on the holdout set: -0.19568170577598212
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.1s remaining:    0.0s
[CV]  _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
                          learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=31,
                          n_iter_no_change=None, presort=False,
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=True), score=-0.248, total=   0.1s
[CV] _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=30, n_jobs=1,
                      oob_score=False, random_state=None, verbose=0,
                      warm_start=False) 
[CV]  _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=30, n_jobs=1,
                      oob_score=False, random_state=None, verbose=0,
                      warm_start=False), score=-0.236, total=   0.1s
[CV] _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=30, n_jobs=1,
                      oob_score=False, random_state=None, verbose=0,
                      warm_start=False) 
[CV]  _scorer=<auto_ml.utils_scoring.RegressionScorer object at 0x7f8174e77780>, model=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=None,
                      max_features='auto', max_leaf_nodes=None,
                      min_impurity_decrease=0.0, min_impurity_split=None,
                      min_samples_leaf=1, min_samples_split=2,
                      min_weight_fraction_leaf=0.0, n_estimators=30, n_jobs=1,
                      oob_score=False, random_state=None, verbose=0,
                      warm_start=False), score=-0.262, total=   0.1s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.2s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   3 out of   3 | elapsed:    0.3s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.4s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   4 out of   4 | elapsed:    0.4s finished
The best CV score from our hyperparameter search (by default averaging across k-fold CV) for CHAS is:
-0.24878969905471796
The best params were
{'model': 'RandomForestRegressor'}
Here are all the hyperparameters that were tried:
Score in the following columns always refers to cross-validation score
+--------------+
|   mean_score |
|--------------|
|      -0.2519 |
|      -0.2488 |
+--------------+
Calculating feature responses, for advanced analytics.
Here are our feature responses for the trained model
+----+----------------+---------+-------------------+-------------------+-----------+-----------+
|    | Feature Name   |   Delta |   FR_Decrementing |   FR_Incrementing |   FRD_MAD |   FRI_MAD |
|----+----------------+---------+-------------------+-------------------+-----------+-----------|
| 12 | AGE            | 13.9801 |            0.0050 |           -0.0002 |    0.0000 |    0.0000 |
| 11 | LSTAT          |  3.5508 |            0.0263 |            0.0011 |    0.0000 |    0.0000 |
| 10 | ZN             | 11.5619 |           -0.0005 |            0.0012 |    0.0000 |    0.0000 |
|  9 | DIS            |  1.0643 |            0.1068 |            0.0017 |    0.0000 |    0.0000 |
|  8 | B              | 45.7266 |           -0.0071 |            0.0026 |    0.0000 |    0.0000 |
|  7 | RM             |  0.3543 |            0.0022 |            0.0073 |    0.0000 |    0.0000 |
|  6 | TAX            | 82.9834 |            0.0111 |           -0.0075 |    0.0000 |    0.0000 |
|  5 | PTRATIO        |  1.1130 |            0.0111 |           -0.0081 |    0.0000 |    0.0000 |
|  4 | MEDV           |  4.6603 |           -0.0050 |            0.0104 |    0.0000 |    0.0000 |
|  3 | CRIM           |  4.4320 |            0.0123 |            0.0221 |    0.0000 |    0.0000 |
|  2 | INDUS          |  3.4430 |            0.0060 |            0.0229 |    0.0000 |    0.0000 |
|  1 | RAD            |  4.2895 |           -0.0021 |            0.0350 |    0.0000 |    0.0000 |
|  0 | NOX            |  0.0588 |           -0.0099 |            0.0545 |    0.0000 |    0.0000 |
+----+----------------+---------+-------------------+-------------------+-----------+-----------+
<auto_ml.predictor.Predictor at 0x7f81a83eb320>
ghk829 commented 5 years ago

@loaiabdalslam thank you for checking the issue. What parameters did you set for running process? The same as mine ?

loaiabdalslam commented 5 years ago

yea just create a new virtual env for python and try installing everything on it

Aun0124 commented 3 years ago

how to put a list of models and let it choose the best in the latest update in the latest documentations? and i didnt get the results of recommend the best model for me.