AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Redundant results and model overfitting #107

Closed mahatibharadwaj closed 3 years ago

mahatibharadwaj commented 3 years ago
  1. We are getting same results irrespective of number of max_evals and seed change.
  2. We have increased n_fold and also reduced max_evals to see if we get different results. For any combination of parameter settings we are getting same results. I think the model is over-fitting during training. Is there any other way where we can check and stop this to get better results?
  3. In our use-case we have not used ne, ce, and fs params in 'space' settings. Is there a way to use stacking regression without these? We are not able to resolve errors while using stacking with only params related to algorithm selection in regression strategy.

I will be grateful if you can help me resolve the above issues. Thanks.

AxeldeRomblay commented 3 years ago

Hello @mahatibharadwaj ! Can you share a screenshot please ?

mahatibharadwaj commented 3 years ago

Hello @mahatibharadwaj ! Can you share a screenshot please ?

This is the code snippet I have used. Screenshot (61)

mahatibharadwaj commented 3 years ago

Hello @mahatibharadwaj ! Can you share a screenshot please ?

Screenshot (62) This is the code snippet for stacking regressor. How to use this space for fitting the model using stacking regressor. please help

AxeldeRomblay commented 3 years ago

Thanks ! So here are the answers :

  1. np.seed doesn't help, you have to directly use the 'random_state' argument in the Optimiser class. Regarding max_evals, as you have few configurations to test, it is normal to get the same results...
  2. What do you mean by different results ? Different best hyperparameters ? or different scores ?
  3. please refer to : https://github.com/AxeldeRomblay/MLBox/issues/88
mahatibharadwaj commented 3 years ago

Thanks ! So here are the answers :

  1. np.seed doesn't help, you have to directly use the 'random_state' argument in the Optimiser class. Regarding max_evals, as you have few configurations to test, it is normal to get the same results...
  2. What do you mean by different results ? Different best hyperparameters ? or different scores ?
  3. please refer to : #88
  1. By same results I mean irrespective of any combination of parameter settings the predicted values of target variable is the same upto 6th decimal place. Please let me know how to resolve this.

Also I have another issue - joblib directory is not getting created inside the save folder.