deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.86k stars 592 forks source link

Fix issue deanmalmgren#342 #422

Open TheElementalOfDestruction opened 2 years ago

TheElementalOfDestruction commented 2 years ago

Added a fix for issue #342 caused by extract_msg.Message._getStringStream returning None for streams that are not found in the MSG file (this is intentional and should be handled accordingly). ensure_bytes now checks for None and returns an empty bytes string when found.

Additionally, updated the naming used in the documentation to match the current naming.

Also, for clarification, extract_msg.Message._getStringStream should only ever be returning unicode on Python 2 and str on Python 3. If this is not the case, that is a bug that should be reported.

(By the way, contributing guidelines say to use issue2pr for existing issues, but I couldn't get it to work at all.)