ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

optimize_final_model and XGBoost vs GradientBoosting #357

Open PaulVanDev opened 6 years ago

PaulVanDev commented 6 years ago

Hi, I tested a bit further

  1. I tried to optimize the hyperparameters using "optimize_final_model" on XGBClassifier, and it was blocked after a while saying "saving file at 'mynotebook' " Before I saw errors like these

    AttributeError: Can't get attribute 'Individual' on <module 'deap.creator' from 'C:\Users\pv\Anaconda3\lib\site-packages\deap\creator.py'>

    What's wrong?

  2. I used my data for some predictions MODEL | accuracy | training time

  3. auto_ml - GradientBoosting | 95,50% |   | 40min
    2 . Gradient Boosting | 93,20% |   | 15min

  4. auto_ml - XGBoost | 92,90% |   | 9sec

Results are puzzling me :)
normaly XGBoosting with 8 cores is about between 10x 20x faster than gradient boosting, here it's at least 100x... Why? why XGBoost is not so good than GradientBoosting? is there a way to get the (default) parameters from a trained model for observations?

Paul

PaulVanDev commented 6 years ago

.train(data_d1[feature_columns],model_names=['XGBClassifier','RandomForestClassifier','GradientBoostingClassifier'])

in this case XGBoost is selected with a score of 93.2 which is better then XGBoost alone and less than GradientBoosting alone

and with compare_all_models=True, Randomforest is selected

doesn't look very consistent ;)