Enable encoding detection for the txt parser

deanmalmgren / textract

extract text from any document. no muss. no fuss.

http://textract.readthedocs.io

MIT License

3.92k stars 609 forks source link

Enable encoding detection for the txt parser #456

Open LoicGrobol opened 1 year ago

LoicGrobol commented 1 year ago

As of now, the txt parser reads files in text mode as UTF-8 and fails with other encodings. This makes it return a bytes object, leaving the base decode to figure out the encoding and act accordingly.