deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

ppt support #398

Open idoabelman opened 3 years ago

idoabelman commented 3 years ago

I see that for word documents both the old format .doc and the new format .docx are supported, and similarly for excel (.xls and xslx) but for powerpoint files only the new format is supported. Is there a reason for that? If not it seems like an obvous suggestion to implement if only for completeness sake.

jpweytjens commented 3 years ago

If there's a tool out there that can parse the old Powerpoint format, feel free to create a PR. I think the CLI tool of LibreOffice can, but perhaps there are better options.