deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.94k stars 1.85k forks source link

Parallel LLM Support for maximal GPU Usage #8257

Open Raina-Hardik opened 1 month ago

Raina-Hardik commented 1 month ago

Current implementation of Haystack Pipelines does not support parallel execution of LLMs Much like how the DocumentWriter uses multiple streams for writing to the DocumentStore, running multiple LLMs should be an option provided enough GPU availability. For workflows that require parsing of a wide variety of documents or modals, the option to use multiple LLMs to allow GPU usage to the maximum would be critical.

Solution

Possible Use cases

Proof of Concept

ProofOfConcept