ccao-data / model-res-avm

Automated valuation model for all class 200 residential properties in Cook County (except vacant land and condos)
GNU Affero General Public License v3.0
26 stars 5 forks source link

Switch cross-validation to V-fold instead of time-based #116

Closed dfsnow closed 9 months ago

dfsnow commented 9 months ago

After a lot of experimentation (via #115), I think we should actually just use the simplest possible cross-validation that we can. Thus, this PR makes the following changes:

Thus, the new splitting strategy is 90% train/10% test, then 5 folds of 80% train/20% validation, with 10% of the training set held out for early stopping.

Here's a complete CV run using the new splits. It's very similar to the time-split results, if slightly better.

dfsnow commented 9 months ago

@wrridgeway, yes this is effectively just for parsimony and to avoid the confusion of time-based train/validation early stopping sets in Lightsnip.