BabakHemmatian / Gay_Marriage_Corpus_Study

LDA and RNN for Reddit comments
0 stars 0 forks source link

Add arg to select_random_comments to only select from comments with length > min_comm_length #14

Closed sabjoslo closed 6 years ago

sabjoslo commented 6 years ago

See title.

BabakHemmatian commented 6 years ago

There's another issue with the random subsample. To calculate topic contribution properly, we need a list of how many instances come from each year and that's not very clear for the first few years when we're sampling randomly. If you could add an output to your sampling function that includes those counts, I could finish up the analysis and write to all collaborators with the results. Thanks!

sabjoslo commented 6 years ago

@BabakHemmatian, see ecbc10f.