Closed JiunHaoJhan closed 4 years ago
Hi there! Take a look at https://github.com/facebookresearch/EmpatheticDialogues/blob/master/empchat/datasets/reddit.py#L18 , and see the structure of data
when it's loaded in - you'll need to create the 'w'
, 'cstart'
, and 'cend'
keys to represent the concatenated word tokens of all sentences, the start idxes of all sentences, and the end idxes of all sentences, respectively.
Thanks for your reply.
Hi, may I know how to convert the raw data in Reddit dataset to chunk.pth loaded in reddit.py? I have downloaded reddit dataset, but I have no idea how to process the raw data so that this raw data can work in RedditDataset class in reddit.py.
I have checked the issue, but I still can not understand how to deal with the format in the required file.