Step 3:
From what I have noticed till now, all the categories in columns with <10 unique categories even when the descriptions said top_n=10. Certain categories have been dropped from OHEd columns, for reasons like, when certain category occur for less than 0.5% of the data or only 10 rows. And I don't know, if this is happening inside the pipeline search, are all the categories of a column that doesn't end up with a separate column, being pushed into a column called something like Column_others?
Step 4:
This information about the model isn't enough to reproduce the results sometimes. So, please display all the parameters and not just the ones you set. almost all the models from popular packages now a days have a method get_params().
Improve the description of pipeline.
from this:
to something like below:
for Steps 1 and 2:
Step 3: From what I have noticed till now, all the categories in columns with <10 unique categories even when the descriptions said top_n=10. Certain categories have been dropped from OHEd columns, for reasons like, when certain category occur for less than 0.5% of the data or only 10 rows. And I don't know, if this is happening inside the pipeline search, are all the categories of a column that doesn't end up with a separate column, being pushed into a column called something like
Column_others
?Step 4: This information about the model isn't enough to reproduce the results sometimes. So, please display all the parameters and not just the ones you set. almost all the models from popular packages now a days have a method get_params().