InseadDataAnalytics / INSEADAnalytics

Other
122 stars 1.31k forks source link

Balancing data for training and holdout sets #152

Open rahuld1991 opened 6 years ago

rahuld1991 commented 6 years ago

When i separate my input into training and holdout sets, how can i be sure that both of these are balanced. Currently I am trying to create box plots of the major variables in training and holdout which I think will be important to try and understand if they have similar characteristics. Is there any other visual or statistic measure to check the "balancedness" of the data set?