deanmalmgren / textract

extract text from any document. no muss. no fuss.
http://textract.readthedocs.io
MIT License
3.86k stars 590 forks source link

Pdfminer on Windows searches for pdf2text.py.exe #424

Open PeterTillema opened 2 years ago

PeterTillema commented 2 years ago

When extracting a PDF using the pdfminer method, it looks for an application called pdf2text.py, but the spawn package adds .exe to it automatically. Obviously this file doesn't exists, so it throws an exception.

CaseGuide commented 2 years ago

Same issue. Extracting PDFs is completely broken on Windows if someone pip installs then tries to extract from PDF with either PDFminer or tesseract.