We briefly discussed how to treat excessive true zeros in features even after postprocessing and imputation. The fraction of zeros per feature will give us a good reference to determine our measures.
Before imputation, the calculation results contain 3858 columns including site_id, time, and event flag. Five lagged features will be added after imputation.
The table below summarizes the number of features with zeros up to a certain percentile:
Percentile
50
60
70
80
90
100
Number of features
2218
2174
2100
2031
1881
1247
To note, zero-only features were excluded before the imputation.
We briefly discussed how to treat excessive true zeros in features even after postprocessing and imputation. The fraction of zeros per feature will give us a good reference to determine our measures.
@kyle-messier