dssg / triage

General Purpose Risk Modeling and Prediction Toolkit for Policy and Social Good Problems
Other
187 stars 61 forks source link

Collate produces redundant imputation flags #544

Closed ecsalomon closed 5 years ago

ecsalomon commented 5 years ago

The imputations for a categorical or quantity will be the same for the same aggregation period, regardless of aggregation function. This produces a lot of redundant columns. For example, the following features will have exactly the same imputation flag columns:

Collate should add only one imputation column per quantity/categorical per aggregation period.

thcrock commented 5 years ago

I'm guessing we should name the _imp column similarly but without the aggregate function? e.g.

zip_code_features_zip_code_1year_num_events_min_imp zip_code_features_zip_code_1year_num_events_max_imp

become zip_code_features_zip_code_1year_num_events_imp ?