Closed bellecarrell closed 5 years ago
@abenton I added a method signature train_dev_test to analysis/file_data_util.py in case you're developing ensemble before I finish the method
*just edited. should be good now
tested on dummy data and real data. real data results: all users 746 train 437 dev 159 test 147
@bellecarrell: Yes, stratify before splitting. For example:
gather all "gastronomy" users in follower count bin 0 shuffle these users take the first 60% and label them as train, the next 20% as dev, and the last 20% as test if the number of users is not evenly divisible by 5, take the remainder of users and randomly assign them to train/dev/test at random with the above assignment probabilities