Closed TaoRuan-Campus closed 1 year ago
I don't have a code example for you, but a high level, the construction process was something like:
X
popular subredditsY
conversations from each subreddit corpus within a given time period, where each conversation has at least N
utterancesreddit-corpus-small
did this with X=100
, Y=100
and N=100
. One peculiarity is that it defines conversations as starting from a top-level comment, whereas in the subreddit corpora themselves, the conversation starts from the Reddit post.
corpus = Corpus(filename=download("reddit-corpus-small"))
Could you please give an example of how to construct the Reddit corpus such as"reddit-corpus-small"?