Diagnostics: "Challenge groups"

ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)

GNU Affero General Public License v3.0

20 stars 3 forks source link

Goal: examine, per each model run, the model's performance for groups that have previously been challenging for the model, and/or which are of special interest. Historically these have needed analyst desk review to resolve. The use case of these "Challenge groups" diagnostics are to see how different models might reduce the need for desk review.

We'll use the existing model performance doc as a launch point, but this will be a new standalone report per run.

Challenge groups of interest:

Baseline? (for comparison to challenge groups)
Properties with large lots (ind_land_gte_95_percentile)
Multi-card properties (flag_pin_is_multicard)
Multi-family (meta_class = 211)
Missing characteristics (flag_char_missing_critical_value)
High sale price (flag_prior_near_fmv_top_decile, though note this is the top decile of the township)

For each challenge group, for the whole City tri, here's what I'm thinking:

Topline summary: N, median prior value, median current estimated FMV, median % change, boxplot of % change? Or scatterplot (x=prior MV, y=estimated MV?)
Topline ratio stats: N sales, median FMV, median sale price, ratio stats per challenge group
Leaflet map, dot color to indicate % change

cc @dfsnow

Great work so far. Next steps:

[x] Let's add intro text to explain each of these categories. You'll probably have to do some digging... for example, here's how flag_char_missing_critical_value is initially constructed. Please also clarify in the initial text any geographic specs (e.g., clarify that the stats are countywide), and any other constraints on the data (e.g., which years of sales are we using?)
[x] The current topline table ("Difficult Categories") is only on the sold properties. Let's split it into two sections, for two universes:
[x] Universe of all properties: N, median prior value, median current estimated FMV, median % change
[x] Universe of sold properties, ratio stats: N sales, median FMV, median sale price, ratio stats per challenge group

ccao-data / model-res-avm

Diagnostics: "Challenge groups" #189