Closed nickchomey closed 1 year ago
I'm going to close this for now - the problem seems to have gone away. My best guess is that I was accidentally using the wrong python interpreter - it would explain why pdftotext -v
worked in the CLI, but not while running the python application.
Nevermind. I don't think it was the interpreter. I think what fixed it was that I forgot that I had removed the PDFToTextConverter stuff from /rest_api/pipeline/pipelines.haystack-pipeline.yml
and then reinstalled with pip install rest_api/
...
I just uninstalled rest_api and reinstalled it, after having re-cloned the repo (and restoring the yml), and I get this error again.
PDFToTextConverter seems to be working for me now. I really don't know what I changed... Perhaps some venv stuff...
Hi @vibha0411, let me follow up your message on #182 here as they seem related.
I could only reproduce the bug by deleting the PDFToTextConverter
class in pdf.py
file. Have you made any similar changes?
Also, can you also share information on your OS and Haystack version?
My OS is MAC Monterey. Haystack I have installed from the repo (main branch)
The only major change i have made is in /haystack/document_stores/elasticsearch.py where i am connecting to the a elastic cloud instead of localhost:9200
I also reverted all the changes and still I get the error :(
I tried to reproduce your error and I could. For me, I get the error only in my miniconda3
environments. My miniforge
environments work just fine. I am not sure if this is the real issue tbh, as I have limited knowledge here. I'll keep you updated 👍
Yes I am using a miniconda3 environment as well... Thanks! Please keep me updated
Hi @vibha0411, updates here! Apparently, the problem is not about miniconda3
vs miniforge
. Sorry for misguiding you there 😞
I realized that when we install haystack with pip install haystack/
, not all necessary packages for PDFToTextConverter
are installed. pdf2image
and pytesseract
packages need to be installed additionally. These packages are basically from OCR
dependency option listed under custom installation. To install the additional packages, you can either use pip install -e '.[ocr]'
in the haystack folder or try pip install -e haystack/'.[ocr]'
from one level above in the directory.
Thanks a loooot @bilgeyucel it finally worked!!!!!
Describe the bug I want to run the Demo site without Docker.
I installed Haystack with
Then I tried to run the Rest API server without docker, as per your documentation
gunicorn rest_api.application:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker -t 300
But I get the following error
Error message
I then installed xpdf using the command from the Docker File (is this necessary? It isn't shown in your documentation)
and confirmed that it is available with
System: