train dev test split - Githubissues

bellecarrell / twitter_brand

In developing a brand on Twitter (and social media in general), how does what you say and how you say it correspond to positive results (more followers, for example)?

0 stars 1 forks source link

train dev test split #89

Closed bellecarrell closed 5 years ago

bellecarrell commented 5 years ago

@bellecarrell: Yes, stratify before splitting. For example:

gather all "gastronomy" users in follower count bin 0 shuffle these users take the first 60% and label them as train, the next 20% as dev, and the last 20% as test if the number of users is not evenly divisible by 5, take the remainder of users and randomly assign them to train/dev/test at random with the above assignment probabilities

bellecarrell commented 5 years ago

@abenton I added a method signature train_dev_test to analysis/file_data_util.py in case you're developing ensemble before I finish the method

bellecarrell commented 5 years ago

*just edited. should be good now

bellecarrell commented 5 years ago

tested on dummy data and real data. real data results: all users 746 train 437 dev 159 test 147