Closed ArturoAmorQ closed 1 year ago
This is still draft so I did not merge. But feel free to undraft and merge.
I think we should use set_output(transform="pandas")
by default in the notebook titled "Encoding of categorical variables".
I think we should use
set_output(transform="pandas")
by default in the notebook titled "Encoding of categorical variables".
The global setting raises an ValueError: Pandas output does not support sparse data
when training the model at the end of the notebook.
We can still set the output to be dataframe when creating the instances in the rest of the notebook, and use new instances with default input for the pipeline.
In the notebook linear_model_regularization
, I am wondering if we should advocate for trying to get the feature names from model[:-1].get_feature_names_out(...)
or instead have set_output
and then access model[-1].feature_names_in_
.
+1 for model[-1].feature_names_in_
which should make the code even shorter.
Otherwise LGTM.
Pandas output with
set_output
API is available since v 1.2.This PR introduces such a nice feature to the MOOC.