deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.84k stars 585 forks source link

Fix Parser to ignore encoding errors #485

Open DmitryMalishev opened 9 months ago

DmitryMalishev commented 9 months ago

This PR could fix errors like "codec can't decode byte XXX in position YYY" that now happens quite often and cannot be solved by any API parameters rather than fixing textract's sources directly.