Closed dfsnow closed 5 months ago
We've tested a few different combinations of geographic features + treatments of geographic variables:
2024-01-16-cranky-sam
2024-01-17-nostalgic-christian
2024-01-17-eager-boni
2024-01-17-practical-billy
It seems like changing these predictors is mostly tinkering around the margins. Notably, removing or changing one location predictor causes others to "pick up the slack." For instance, removing all location features makes median income (a proxy for location) become more important. As such, I don't think the ROI on these changes is high enough to pursue them further, and I'm closing this issue.
Currently, the residential AVM relies on a combination of township, neighborhood, and lat/lon to determine the value of location. These features tend to be among the most important in the model. However, they are not always well-defined or relevant to price. Neighborhoods can be too large (or too small), and township boundaries are mostly arbitrary. We should test some smaller units of geography as geography features:
One interesting thing to try: currently the model doesn't have way to measure neighborhood proximity, i.e. it doesn't know two neighborhoods are close together. If we numerically order the neighborhoods and treat them as numeric, rather than categorical, predictors, then the model might be able to group neighborhoods by their relative proximity.