deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

Parallel LLM Support for maximal GPU Usage #8257

Open Raina-Hardik opened 2 months ago

Raina-Hardik commented 2 months ago

Current implementation of Haystack Pipelines does not support parallel execution of LLMs Much like how the DocumentWriter uses multiple streams for writing to the DocumentStore, running multiple LLMs should be an option provided enough GPU availability. For workflows that require parsing of a wide variety of documents or modals, the option to use multiple LLMs to allow GPU usage to the maximum would be critical.

Solution

Possible Use cases

Proof of Concept

ProofOfConcept