Diyago / kaggle-malware

https://www.kaggle.com/c/microsoft-malware-prediction/
1 stars 0 forks source link

Validation strategy #5

Open Diyago opened 5 years ago

Diyago commented 5 years ago

If you sort your Data by AvSigVersion (as numeric) and split Train Data on Fit and Val sets as Past and Future observations, you can get your CV almost the same as LB. It happens because 84% of observations in Test Data are in the future by AvSigVersion column. For example if I split the data on 60 and 40 I get about 0.69 CV, and about 0.7 CV in case 80/20.

Adversarial Validation (пример использования тут https://www.kaggle.com/tunguz/elo-adversarial-validation) если local CV в зоне 0.04 от паблика, то норм