Goal: examine, per each model run, the model's performance for groups that have previously been challenging for the model, and/or which are of special interest. Historically these have needed analyst desk review to resolve. The use case of these "Challenge groups" diagnostics are to see how different models might reduce the need for desk review.
We'll use the existing model performance doc as a launch point, but this will be a new standalone report per run.
Challenge groups of interest:
Baseline? (for comparison to challenge groups)
Properties with large lots (ind_land_gte_95_percentile)
High sale price (flag_prior_near_fmv_top_decile, though note this is the top decile of the township)
For each challenge group, for the whole City tri, here's what I'm thinking:
Topline summary: N, median prior value, median current estimated FMV, median % change, boxplot of % change? Or scatterplot (x=prior MV, y=estimated MV?)
Topline ratio stats: N sales, median FMV, median sale price, ratio stats per challenge group
[x] Let's add intro text to explain each of these categories. You'll probably have to do some digging... for example, here's how flag_char_missing_critical_value is initially constructed. Please also clarify in the initial text any geographic specs (e.g., clarify that the stats are countywide), and any other constraints on the data (e.g., which years of sales are we using?)
[x] The current topline table ("Difficult Categories") is only on the sold properties. Let's split it into two sections, for two universes:
[x] Universe of all properties: N, median prior value, median current estimated FMV, median % change
[x] Universe of sold properties, ratio stats: N sales, median FMV, median sale price, ratio stats per challenge group
Goal: examine, per each model run, the model's performance for groups that have previously been challenging for the model, and/or which are of special interest. Historically these have needed analyst desk review to resolve. The use case of these "Challenge groups" diagnostics are to see how different models might reduce the need for desk review.
We'll use the existing model performance doc as a launch point, but this will be a new standalone report per run.
Challenge groups of interest:
For each challenge group, for the whole City tri, here's what I'm thinking:
cc @dfsnow