ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Test smaller / different geographic features #166

Closed dfsnow closed 5 months ago

dfsnow commented 5 months ago

Currently, the residential AVM relies on a combination of township, neighborhood, and lat/lon to determine the value of location. These features tend to be among the most important in the model. However, they are not always well-defined or relevant to price. Neighborhoods can be too large (or too small), and township boundaries are mostly arbitrary. We should test some smaller units of geography as geography features:

One interesting thing to try: currently the model doesn't have way to measure neighborhood proximity, i.e. it doesn't know two neighborhoods are close together. If we numerically order the neighborhoods and treat them as numeric, rather than categorical, predictors, then the model might be able to group neighborhoods by their relative proximity.

dfsnow commented 5 months ago

We've tested a few different combinations of geographic features + treatments of geographic variables:

It seems like changing these predictors is mostly tinkering around the margins. Notably, removing or changing one location predictor causes others to "pick up the slack." For instance, removing all location features makes median income (a proxy for location) become more important. As such, I don't think the ROI on these changes is high enough to pursue them further, and I'm closing this issue.