deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.09k stars 1.87k forks source link

Add a reload pipeline endpoint in REST API #3175

Closed danielbichuetti closed 4 months ago

danielbichuetti commented 2 years ago

Is your feature request related to a problem? Please describe. Haystack loads the pipelines during the REST API application setup. Which happens once. If the user decides to change the pipeline, it has to restart the application (or possibly the container) or do a pod rollout.

Describe the solution you'd like My solution would be to implement an endpoint (/reload) to reload pipeline. It is similar to Nginx reload signal method to reload configurations. This change will make it possible to use volume mounts to store the YAML. And when the user changes the file, it calls the /reload endpoint. On Kubernetes environments, this is very useful in a combo: ConfigMap + reload mechanism. A pipeline change could be done by essentially rewriting the ConfigMap, and triggering the reload on the REST API.

Describe alternatives you've considered Current alternatives for reloading the pipeline are:

Additional context

ZanSara commented 2 years ago

Hey @danielbichuetti! In general I agree this feature could be useful, so you can go ahead and implement it.

On the other hand, right now there's no way to modify the file through the API themselves, so the usefulness of such endpoint is a bit limited: to edit the file you must have access to the machine/container running the server, and if you have access to that, you could as well just reboot the process instead of using the /reload endpoint.

In addition, given that the REST API are used in our public Haystack demo, it would be cool if you could only enable such endpoint with an env var :smile:

If you go ahead with this, a useful follow up could be to add an endpoint that allows users to submit new pipelines. Let's open a to-do issue for that in case!

danielbichuetti commented 2 years ago

@ZanSara You made an interesting point regarding Docker containers.

When I first conceived of it, I was focused on Kubernetes deployments. On Kubernetes, you can use mapped volume mounts, such as ConfigMaps (this is how we do it at the company) But then you need to roll out the pod again. This could take some seconds. This would eliminate this need.

But it could be further improved to provide wider benefits (e.g., to Docker users)

So, what do you think about an endpoint like /pipelines/management/ (protected by an API key, which is set up by an env var) where users can:

And, of course, the first proposed /pipelines/reload or /reload.

Would the management endpoint be too much? Or better than just an update endpoint, provide a full management endpoint.

I've also noticed that the way the REST API is implemented is not thread-safe when using the Fast-Tokenizers from HuggingFace. I feel that changing the way the architecture was designed will prevent these issues and improve performance. Currently, it allows setting by an env var the max concurrent requests. And the docker container sets gunicorn to 1 worker. This leads to multithreading. What about changing to multiprocessing? The env var would control the workers in this case, and we would limit to 1 request per worker. On gunicorn, each worker is a sub-process, this is why it's thread-safe. Usually, the recommendations for API inference endpoints that use the RUST tokenizers are multiprocessing. It's also a balance, when you increase concurrent requests (anyway) you are slowing all requests in favor of more parallel ones.

And the easier solution would be to turn from Fast-Tokenizers to Tokenizers. But as Haystack is focused on NLP, sentence tokenization would have performance impacted. (IMHO, I don't like this solution).

ZanSara commented 2 years ago

Honestly sounds good! I like the idea of introducing pipeline management to the REST API. No need to be a single endpoint either, you can make a few if it looks more tidy. The only reason why it's not there yet is that these APIs are only used for the demo and no one ever felt the need for it :smile:

I wonder if this is too much work for a single PR, but if it grows too big feel free to split it into smaller ones referencing each other.

Regarding concurrency problems you're totally right. We even have a really tricky issue reported on this topic: https://github.com/deepset-ai/haystack/issues/3093 We should keep it in mind and test your changes against this vanishing bug too. Maybe they're going to solve it! I also agree that we should try to keep FastTokenizers as much as possible.

danielbichuetti commented 2 years ago

I'll start asap this PR. Thank you! :smile:

Many users are trying to use rest_api as the default usage pattern for Haystack. I think that improving it will be nice. Maybe we can turn it production-grade with time.

nickchomey commented 2 years ago

I deeply appreciate what this proposal has evolved into, as I am one of the users who would like to use the REST API as the default usage pattern - it would simply be crazy for me (or mostly any user, frankly) to essentially create my own custom FastAPI application that just uses bits and pieces of Haystack, which itself is ever-changing...

But I think that this effort to add pipeline management to the REST API might be best approached after consideration of the broader context/architecture for the REST API and Pipelines, which is already on the Roadmap.

I wrote considerably more about all of this in this discussion on Best practices for extending/customizing the Haystack REST API. As mentioned there, I think it would be a perfect topic for the next Contributor Cafe or, better yet, an open RFP discussion for the Deepset team and community - as it stands, Haystack seems to be more of a toolkit than a framework for adding AI Powered Search to your application (there's an excellent discussion of these terms in this thread at StackOverflow). It seems to me that this makes it very unapproachable and impractical for most potential users, which is very unnecessary and unfortunate.

masci commented 4 months ago

This is now possible in hayhooks https://github.com/deepset-ai/hayhooks