Test dedicated missingness imputation for main model

ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)

GNU Affero General Public License v3.0

20 stars 3 forks source link

I tested this a bunch locally using the recipes::step_impute_ functions of different flavors. TL;DR, it doesn't make much different for our outcomes. We don't have too much missingness to begin with and it seems like LightGBM does a fine job of handling it natively.

The one thing I was unable to test was the more advanced imputation strategies such as bagging and KNN. Each of them takes absolutely forever to run, even on a beefy m4/m5 AWS instance.

I'd say this is worth revisiting in the future, but probably won't have a big immediate impact on model outcomes.

ccao-data / model-res-avm

Test dedicated missingness imputation for main model #162