Open dkltimon opened 9 years ago
You're completely right. Thanks for the report.
I'm very used to Linux and OS X where the default encoding is frequently utf-8 and you don't need to specify utf-8 under Python 3. For the longest time I assumed that utf-8 was actually the fixed default for Python 3.
Hi Allen,
https://de.dariah.eu/tatom/preprocessing.html#every-1-000-words
def split_text(filename, n_words): ....: """Split a text into chunks approximately
n_words
words in length.""" ....: input = open(filename, 'r') ....: words = input.read().split(' ') ....: input.close()At the place of "input = open(filname, 'r')".
I don't konw if we use "input = open(filname, 'r', encoding = 'UTF-8')" would be better.
Otherwise you may get error message: "UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 10: character maps to".