Open thomasjpfan opened 4 years ago
Thanks. Huh I didn't scale? That's... weird and a pretty obvious oversight. Target encoder doesn't work better though?
Target encoder does not work better:
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from category_encoders import TargetEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import cross_val_score
from sklearn.datasets import fetch_openml
X, y = fetch_openml("house_sales", as_frame=True, version=2, return_X_y=True)
X = X.drop(['date'], axis=1)
prep = ColumnTransformer([('encoder', TargetEncoder(), ['zipcode'])],
pipe = Pipeline([('prep', prep), ('clf', Ridge())])
scores = cross_val_score(pipe, X, y)
# 0.7862
huh. Ames housing or melbourne housing then? or Or any of the ones from Gael's paper? Or is that all classification?
When the numerical data is scaled the one hot encoder works pretty well: