Closed dfsnow closed 8 months ago
Constructing a price strata feature OR a lagged price feature doesn't work in this model.
To explain why, I'll focus on the lag price variant of the feature, since the strata feature is just a binned version of the lag price. The lag price feature is equal to whichever of the following is first available:
This construction is intended to act like an autoregressive feature in a time series model, i.e. we believe the predicted price is (to some extent) dependent on the prior price, so the prior price must be included as a feature.
This would work if all properties had sales, however things get tricky when we need to construct the feature for the assessment set (the universe of all properties). There are two major problems:
So, scrapping this feature for now. I think in the future we should spend some time considering a better way to construct some kind of autoregressive features.
This PR adds a "price point" or "market strata" model feature based on a property's prior year values. The goal of this feature is to roughly capture where a property lies in the distribution of price and to give the model a "hint" or starting point for prediction.
The feature is constructed by first determining the "strata price." This is the most up-to-date value available for a given property. Strata price is equal to the following value (whichever is available first), in descending order of preference:
The resulting strata price is then binned into N-tiles based on township and year. The binned N-tile is passed to the model as a categorical feature.
Pros:
Cons:
Closes #160.
CC @ccao-jardine