deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.89k stars 599 forks source link

Code and test for the PDF fallback to Tesseract feature #245

Open parkerhancock opened 6 years ago

parkerhancock commented 6 years ago

See https://github.com/deanmalmgren/textract/issues/244

traverseda commented 3 years ago

I've recently become a maintainer of textract.

Unfortunately I've had to remove the "download_file" test case, as it was failing intermittently. If you're still interested in this PR just get the test cases passing again and I'll merge it right away.