UnicodeDecodeError - Githubissues

Hi Allen,

https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words

def split_text(filename, n_words): ....: """Split a text into chunks approximately n_words words in length.""" ....: input = open(filename, 'r') ....: words = input.read().split(' ') ....: input.close()

At the place of "input = open(filname, 'r')".

I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.

Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to ".

ariddell / tatom

UnicodeDecodeError #11