Maybe we can create some features by ourselves.
For example,
for df in combine:
df['life_pct']=df['life_sq']/df['full_sq'].astype(float)
df['rel_kitch']=df['kitch_sq']/df['full_sq'].astype(float) # area of kitchen occupies the whole area
df['rel_floor']=df['floor']/df['max_floor'].astype(float) # average floor numbers
and for time and sub.area:
for df in combine:
month_year = (df.timestamp.dt.month + (df.timestamp.dt.year)*100)
month_year_cnt_map = month_year.value_counts().to_dict()
df['month_year_cnt'] = month_year.map(month_year_cnt_map)
Maybe we can create some features by ourselves. For example, for df in combine: df['life_pct']=df['life_sq']/df['full_sq'].astype(float) df['rel_kitch']=df['kitch_sq']/df['full_sq'].astype(float) # area of kitchen occupies the whole area df['rel_floor']=df['floor']/df['max_floor'].astype(float) # average floor numbers
and for time and sub.area:
for df in combine: month_year = (df.timestamp.dt.month + (df.timestamp.dt.year)*100) month_year_cnt_map = month_year.value_counts().to_dict() df['month_year_cnt'] = month_year.map(month_year_cnt_map)
it uses the frequency count of each area, the higher freq nominal value gets the higher weight.