ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Create a feature selection/evaluation template #249

Open dfsnow opened 5 days ago

dfsnow commented 5 days ago

The current CCAO feature evaluation process for new model features is very ad-hoc. We typically look at the change in model performance metrics before and after the addition of a new feature, as well as the absolute SHAP values associated with that feature.

In order to make this ad-hoc process slightly more repeatable and rigorous, we should create a template Quarto document we can use to evaluate new features. This document should contain both standard, repeatable sections (e.g. model performance stats by township) and a series of questions that will likely require additional ad-hoc analysis. For example, given a question like "Where is the new model feature most impactful?" and a feature that adds distance to stadium, one might add maps of PIN-level SHAP values surrounding each stadium.

Goal

The goal here is to remove (or exclude in the first place) features which have no predictive power in any geography (i.e. they are merely noise). The goal is not to remove features which may be redundant, only mildly predictive, or only predictive in certain geographic areas; all of this work is done more-or-less automatically by the model, which is regularized and performs other forms of de-facto feature selection.

Task

Create a Quarto document at analyses/new-feature-template.qmd that can be used to evaluate whether or not new features are merely noise. The document should:

This document can be copied and then renamed for each new feature added, similar to the workflow for enterprise intelligence.

@ccao-jardine

ccao-jardine commented 3 days ago

Excellent. Let's add to the scope of templated content: