Closed avinashsai closed 3 years ago
Hi @avinashsai - sorry for the delay! The 'prompt' and 'utterance' columns can be the same if the Speaker simply gives the prompt as the first utterance of the conversation. Can you give me the pandas command that is failing when you try to load these files? I can try to reproduce
`import pandas as pd
data = pd.read_csv('train.csv')
Traceback (most recent call last):
File "
I've just looked into this - the way this repo loads in that file is by reading it in a text file and then processing it line-by-line, here: https://github.com/facebookresearch/EmpatheticDialogues/blob/master/empchat/datasets/empchat.py#L84 I'd try that instead of loading it as a pandas DataFrame directly
I worked around this with by replacing "
with sed -i 's/"/\\"/g' train.csv
then, I read it with: df = pd.read_csv("train.csv", sep=",", encoding='utf-8', engine="python", escapechar="\\")
Although the issue has already been closed, I would like to raise another solution to this. If this solution has any hidden problem, please kindly let me know it.
df = pd.read_csv("./train.csv", usecols=['conv_id', 'utterance_idx', 'context', 'prompt', 'speaker_idx', 'utterance', 'selfeval', 'tags'])
However, if you look into the train.csv, you will find the following problem: For hit:832_conv:1665, the utterance column has been put into some information, which is supposed to be the following lines.
When I tried to load the train.csv, I observed these errors:
I haven't checked for valid.csv and test.csv. please, fix these in the files.
Thank you