deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.98k stars 1.85k forks source link

Docker image crashes upon querying #6268

Closed r4881t closed 4 months ago

r4881t commented 10 months ago

Describe the bug

Short Issue: The docker container which runs a webserver running haystack, crashes while running a pipeline.

Long Text: I have a webapp which exposes query API. When someone makes a POST Call, the controller runs a RAG Pipeline. This keeps on crashing everytime the pipeline is run. I have been unable to identify any error log. This does not happen if i run the webserver natively on my OS

Error message No error message is thrown

Expected behavior The RAG Pipeline should run.

Additional context I have enabled DEBUG in haystack & my app logs and attaching the log file docker_crash.log. It crashes exactly after outputting Batches: 0%| | 0/1 [00:00<?, ?it/s]% text. My Dockerfile is pasted below

FROM  python:3.11 AS base

WORKDIR /app

COPY . .

RUN pip install --upgrade pip

RUN pip install --no-cache-dir -r ./requirements.txt

EXPOSE 8080

CMD ["uvicorn", "friday.server:app", "--host", "0.0.0.0", "--port", "8080"]

To Reproduce

FAQ Check

System:

masci commented 10 months ago

In the logs I see:

DEBUG - haystack.telemetry -  Telemetry couldn't make a POST request to PostHog.
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/haystack/telemetry.py", line 98, in send_event
    json.dumps({**self.event_properties, **dynamic_specs, **event_properties}, sort_keys=True)
  File "/usr/local/lib/python3.11/json/__init__.py", line 238, in dumps
    **kw).encode(obj)
          ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type PromptTemplate is not JSON serializable

Can you try disabling telemetry by setting an environment variable HAYSTACK_TELEMETRY_ENABLED=False in your docker container?

r4881t commented 10 months ago

Tried with telemetry disabled. No change in the behaviour. I am on Apple M1. Here's additional details I did

  1. Instead of docker run -p 8080:8080 --env-file .env friday, I went into an interactive terminal in the container and started the uvicorn from there. I notice that while pipe run, there's a segmentation fault that happens.
  2. I noticed that I have use_gpu=True in my EmbeddingRetriever, so I turned it to False and rebuilt the image and repeated step 1
  3. That didn't change anything, so I attached a debugger
  4. I btd the debugger and attaching logs here. I am not much of an expert in low level debugging but it seems like something to do with pytorch libs. docker_crash_with_gdb.log

I will continue to fiddle more.

masci commented 10 months ago

Can you try setting also HAYSTACK_MPS_ENABLED : false?

r4881t commented 10 months ago

I tried with both HAYSTACK_MPS_ENABLED=False and HAYSTACK_MPS_ENABLED=false in the env, and it still crashes. Attached is the gdb log when HAYSTACK_MPS_ENABLED is set to false. docker_crash2.log

masci commented 10 months ago

I tried to reproduce on my Mac using the docker-compose file from here https://github.com/deepset-ai/haystack-demos/tree/main/explore_the_world pulling 1.21.1 but it works. If you can post a minimal setup I can reproduce locally I'll try again.

alizingly commented 8 months ago

I faced same issue but then realized that the problem happens due to OpenSearch wrong credentials, I don't know why the error message is so off the reason of the error