dbashford / textract

node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
MIT License
1.64k stars 185 forks source link

'pdftotext' does not appear to be installed #213

Open codingalien-d opened 3 years ago

codingalien-d commented 3 years ago

We are getting an issue with pdftotext. installed the package here in /usr/local/bin.

It is working in local machine. when we tried in the server, we got this issue

extractor for type exists, but failed to initialize. Message: INFO: 'pdftotext' does not appear to be installed, so textract will be unable to extract PDFs.

Did we missed any packages or steps? Please help us to solve this issue.

OS: ubuntu

gabrielkf commented 3 years ago

Try to install poppler-utils:

sudo apt install poppler-utils

It worked on CentOS 5, btw, but I spent a lot of time trying to find some rpm package.

Adding it to the README would have saved me a lot of time.