dssg / triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Other
187 stars 61 forks source link

Create features per fold #897

Open ecsalomon opened 2 years ago

ecsalomon commented 2 years ago

Triage would ideally create features for each temporal fold, such that columns (whether quantitative aggregates or categorical choice aggregates) that would not have been available (or which would not have met the conditions of the choice query, e.g., at least 1000 examples of the choice) at training time are not used in the training or test matrices for models built on that fold. This raises some questions we might encounter in implementing this behavior: