deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

Migrate Components to Pipeline v2 #5265

Closed julian-risch closed 1 year ago

julian-risch commented 1 year ago

We are working on Haystack 2.0, with a major refactoring of pipelines and components.

Rationale

We need to prioritize the list of components and separately the list of document stores to migrate to pipelines v2. Most risky components and components essential to most pipelines should be migrated first. Let's also collect feedback on what components are most relevant to Sol (@sjrl) to enable them to give feedback early on based on real use cases. Let's also use telemetry data to see what components are most important to the community.

Use cases

List of the usecases to support, in priority order, with a list of the bare minimum components required for them to work. Note: every pipeline needs the components of all the pipelines above it in priority order in order to work.

Each "component type" links to another small epic where the specific component is broken down into a set of requirements, which might eventually be covered by one or more v2 components.

1. Document Search

- [ ] https://github.com/deepset-ai/haystack/issues/5311
- [ ] https://github.com/deepset-ai/haystack/pull/5390
- [ ] https://github.com/deepset-ai/haystack/issues/5312
- [ ] https://github.com/deepset-ai/haystack/issues/5326

Note: Retrievers and Embedder's planning will follow the Docstores

2. Generative QA & Agent Pipelines

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5330
- [ ] https://github.com/deepset-ai/haystack/issues/5614

3. Extractive QA

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5430

4. Minimal Indexing

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5339
- [ ] https://github.com/deepset-ai/haystack/issues/5363
- [ ] https://github.com/deepset-ai/haystack/issues/5581

6. General Indexing

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5362
- [ ] https://github.com/deepset-ai/haystack/issues/5367
- [ ] https://github.com/deepset-ai/haystack/issues/5366

7. Advanced querying

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5626

Agent Pipelines

Agent pipelines will need a bit of exploration to get right. I expect their main enabler to be the LLM component: any other unforeseen component that might be needed here will be prioritized accordingly.

Other

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5341
- [ ] https://github.com/deepset-ai/haystack/issues/5429
- [ ] https://github.com/deepset-ai/haystack/issues/5672
- [ ] https://github.com/deepset-ai/haystack/pull/5390
- [ ] https://github.com/deepset-ai/haystack/issues/5579
- [ ] https://github.com/deepset-ai/haystack/issues/5342
- [ ] https://github.com/deepset-ai/haystack/issues/5339
- [ ] https://github.com/deepset-ai/haystack/issues/5311
- [ ] https://github.com/deepset-ai/haystack/issues/5430
- [ ] https://github.com/deepset-ai/haystack/issues/5627
- [x] https://github.com/deepset-ai/haystack/issues/5628
- [ ] https://github.com/deepset-ai/haystack/issues/5915
- [ ] Finetuning of LLMs in Pipelines 2.0
- [ ] Finetuning of Retriever & Reader models in Pipelines 2.0

Developer relations efforts

Context

masci commented 1 year ago

Unfinished items were added to the roadmap for Q4, closing this one as complete