deepset-ai / hayhooks

Deploy Haystack pipelines behind a REST Api.
https://haystack.deepset.ai
Apache License 2.0
40 stars 13 forks source link

Optional Argument of a pipeline must be provided #37

Open Shuntw6096 opened 2 months ago

Shuntw6096 commented 2 months ago

Enviroments: I use docker.

ARG build_image
ARG base_image

FROM $build_image AS build-image

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
    build-essential \
    git

ARG hayhooks_version

# Shallow clone Hayhooks repo, we'll install from the local sources
RUN git clone --depth=1 --branch=${hayhooks_version} https://github.com/deepset-ai/hayhooks.git /opt/hayhooks
WORKDIR /opt/hayhooks

# Use a virtualenv we can copy over the next build stage
RUN python3 -m venv --system-site-packages /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

RUN pip install --upgrade pip && \
    pip install --no-cache-dir . && \
    pip install langchain==0.2.12 milvus-haystack==0.0.10

FROM $base_image AS final

COPY --from=build-image /opt/venv /opt/venv

ARG pipelines_dir
RUN mkdir -p $pipelines_dir
ENV HAYHOOKS_PIPELINES_DIR=$pipelines_dir

ARG additional_python_path
RUN mkdir -p $additional_python_path
ENV HAYHOOKS_ADDITIONAL_PYTHONPATH=$additional_python_path

EXPOSE 1416
ENV PATH="/opt/venv/bin:$PATH"
CMD ["hayhooks", "run", "--host", "0.0.0.0"]

build args of docker compose.

    build:
      context: .
      dockerfile: dockerfile.hayhooks
      tags:
        - "ccccccc/hayhooks:local"
      args:
        build_image: "deepset/haystack:base-main"
        base_image: "deepset/haystack:base-main"
        hayhooks_version: "main"
        pipelines_dir: "/opt/pipelines"
        additional_python_path: "/opt/custom_components"
      x-bake:
        platforms:
          - linux/amd64

I have a pipeline, its diagram is below: test_pipeline_01

retriever and ranker both have an optional argument top_k: Optional[int].

When I call its hayhooks API,

curl -X 'POST' \
  'http://localhost:1416/retrieval' \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
   "query_builder": {
        "query": "cabbage"
    },
    "retriever": {"top_k": null},
    "ranker": {"top_k": null}
}'

optional argument top_k: Optional[int] must provided, otherwise, I get field required response

{
    "detail": [
        {
            "type": "missing",
            "loc": ["body", "ranker"],
            "msg": "Field required",
            "input": {"query_builder": {"query": "cabbage"}},
        },
        {
            "type": "missing",
            "loc": ["body", "retriever"],
            "msg": "Field required",
            "input": {"query_builder": {"query": "cabbage"}},
        },
    ]
}

And http://localhost:1416/docs#/ seems not to work after deploying the pipeline, I got this error, Failed to load API definition.

2024-09-02T07:59:57.164733458Z pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'pandas.core.frame.DataFrame'>)
2024-09-02T07:59:57.164735841Z 
2024-09-02T07:59:57.164739599Z For further information visit https://errors.pydantic.dev/2.8/u/invalid-for-json-schema

And when I undeployed the pipeline, http://localhost:1416/docs#/ works.

Shuntw6096 commented 2 months ago

Is it a bug?

vblagoje commented 2 months ago

Thanks for this write up @Shuntw6096 - we'll take a look soon cc @julian-risch