UnicodeDecodeError - Githubissues

Hi,

When reading csv files, for example, redditDf = pandas.read_csv('data/reddit.csv', index_col = 0)

I've run into this error multiple times:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 3304: invalid continuation byte

For my own data set, I Googled the error and fixed it with

with codecs.open('xxx.csv', 'r', encoding='utf-8',
                 errors='ignore') as data2:
    df = pd.read_csv(data2, 
                   error_bad_lines=False)

which simply skips the parts that are causing errors. It works well for my data because the problem seems to be minor for my data set (very few observations are lost), but for the reddit data, using this code results in there being only one row left in the data frame. How can I fix this?

Thanks!

Computational-Content-Analysis-2020 / frequently-asked-questions

UnicodeDecodeError #15