Open browshanravan opened 4 years ago
I'm afraid I don't understand your question:
In my example code GaussianNB() was selected as the best estimator however it seems like the selected_model output from grid.bestestimator.get_params() does not reflect this
In the above output, the lines
'clf__selected_model',
'clf__selected_model__priors',
'clf__selected_model__var_smoothing',
suggest that the GaussianNB model was selected as the best estimator, as you describe. What am I missing?
shouldn't it be written as 'clf__selected_model__GaussianNB__priors'
instead of 'clf__selected_model__priors'
? it is not very easy to determine that the selected model was GaussianNB
just by looking at parameters that are written in front of the clf__selected_model
. It is not very explicit given that I have specially defined ("GaussianNB", GaussianNB())
in my PipelineHelper
in my example code.
This will become specially problematic if you have RandomForestClassifier
and ExtraTreesClassifier
in your PipelineHelper
, both of which share almost identical parameters and you have to figure out which one was chosen as selected_model
when calling grid.best_estimator_.get_params()
Ah OK, I now see what you mean. I agree that this would be helpful, but I'll have to think about the internal changes that this fix would imply.
If this is not a trivial matter, then that is fine. A user can always use the grid.best_params_
command and they can see what the best chosen parameter is. I just thought it would be nice to have it in the grid.best_estimator_.get_params()
command.
I like to play with something like this, specially when one is using two scoring functions:
grid = GridSearchCV(pipe, params, scoring='accuracy', verbose=0, n_jobs=-1)
grid.fit(X, y)
df_grid_search = pd.DataFrame(grid.cv_results_)
df_grid_search = df_grid_search.set_index('params')[['mean_fit_time','mean_score_time','mean_test_score',\
'std_test_score','rank_test_score']]
df_grid_search.sort_values(by = 'rank_test_score').head(10)
or with more code-noise:
grid = GridSearchCV(pipe, params, scoring='accuracy', verbose=0, n_jobs=-1)
grid.fit(X, y)
df_grid_search = pd.DataFrame(grid.cv_results_)
df_grid_search['params'] = [str(list(x.values())).replace('(',"").replace(')',"") for x in df_grid_search['params']]
df_grid_search = df_grid_search.set_index('params')[['mean_fit_time','mean_score_time'] + \
[x for x in df_grid_search.columns if ('rank_test' in x) or ('mean_test' in x)]]
df_grid_search.sort_values(by = [x for x in df_grid_search.columns if 'rank_test' in x]).head(10)
In my example code
GaussianNB()
was selected as the best estimator however it seems like the selected_model output fromgrid.best_estimator_.get_params()
does not reflect this, although I have instantiated it asGaussianNB
in the PipelineHelper. The selected_model does however show the parameters forGaussianNB()
such aspriors
andvar_smoothing
. Theavailable_models
in output forgrid.get_params().keys()
looks fine though.I suspect this has something to do with the fact that I have left the default parameters for
GaussianNB()
as they are and did not put anything in the grid_search.here is the
grid.best_estimator_.get_params()
output