Currie32 / Spell-Checker

A seq2seq model that can correct spelling mistakes.
213 stars 93 forks source link

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte #14

Open olixbridge opened 5 years ago

olixbridge commented 5 years ago

I did very similar to the code and I replaced the path by path = '/Users/oliviashi/Documents/software/python/books/'

However I cannot run the file because: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

What should I do? Thank you very much!

olixbridge commented 5 years ago

(venv) (Python37) Olivias-MBP:python oliviashi$ python3 /Users/oliviashi/Documents/software/python/DataGenerator.py Traceback (most recent call last): File "/Users/oliviashi/Documents/software/python/DataGenerator.py", line 31, in books.append(load_book(path+book)) File "/Users/oliviashi/Documents/software/python/DataGenerator.py", line 18, in load_book book = f.read() File "/Users/oliviashi/Documents/software/python/venv/bin/../lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

ashupednekar commented 5 years ago

same issue here

atishSanyal03 commented 5 years ago

just change the line: with open(input_file) as f: to with open(input_file, encoding='windows-1252') as f:

franzvalo1 commented 5 years ago

or you can:

open(input_file,'rb') as f: book = f.read().decode(errors='replace')