deepset-ai / hayhooks

Deploy Haystack pipelines behind a REST Api.
https://haystack.deepset.ai
Apache License 2.0
39 stars 11 forks source link

Support file upload #22

Open masci opened 4 months ago

masci commented 4 months ago

At the moment it's extremely hard to use Hayhooks for indexing pipelines, as they either accept:

Since Hayhooks is in control of the request payload for pipeline endpoints, one possible solution might be accepting multipart form data whenever the input of a pipeline is of type path or bytestream. Hayhooks would receive the file and take care of temporarily storing it server-side, or passing bytes on-the-fly to the pipeline.

OscarIntellico commented 2 months ago

I confirm this issue.

I have an indexing pipeline. The pipeline accepts documents and then indexes them into an ElasticSearch database.

I created a test pipeline with a single DocumentCleaner that expects a list of documents and I have the following two problems:

  1. the /docs endpoint throws me an "Internal Server Error /openapi.json" error. The server logs are showing the following error:

pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.IsInstanceSchema (<class 'pandas.core.frame.DataFrame'>)

  1. When i call the endpoint localhost:1416/doc_cleaner with a list of documents with curl or with python request, I get this error:

TypeError: DocumentCleaner expects a List of Documents as input.