ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
20 stars 3 forks source link

Dfsnow/test cubist model #200

Closed dfsnow closed 5 months ago

dfsnow commented 5 months ago

This PR tests using a Cubist model as an alternative to LightGBM and other GBDT models. It uses a simplified CV loop to just test the Cubist results, rather than a full pipeline refactor.

dfsnow commented 5 months ago

I tested this locally using two different sets of hyperparameters. Both performed well, better than a linear model but worse than the current main model. Unfortunately, the Cubist implementation here has two major drawbacks:

  1. It's single-threaded and takes forever to train. Even a model with simple hyperparameters takes around an hour.
  2. It uses a ton of memory when training.

The combination of these two things made it difficult/impossible to do a full grid search, as a sequential search would take days and a parallel search exhausts 250GB of memory.

Would like to play around with this in the future, but for now it's more trouble than it's worth.

Closes #37.