deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.68k stars 1.83k forks source link

[2.0] Component names matter - but is this a feature? #6512

Closed lfunderburk closed 3 months ago

lfunderburk commented 9 months ago

Advent of Haystack Day 1 and 2

Describe the bug Giving components names that deviate from the instance variable name causes the pipelines to malfunction. This works

from haystack import Pipeline

pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)
pipeline.add_component(name="splitter", instance=splitter)
pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="llm", instance=llm)

pipeline.connect("fetcher", "converter")
pipeline.connect("converter","splitter")
pipeline.connect("splitter", "prompt_builder")
pipeline.connect("prompt_builder", "llm")

query_dict ={
    "urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
    "query": "How do you build a custom component?"
}

Assume I give the splitter instance the name "preprocessor"

from haystack import Pipeline

pipeline = Pipeline()
pipeline.add_component(name="fetcher", instance=fetcher)
pipeline.add_component(name="converter", instance=converter)
pipeline.add_component(name="preprocessor", instance=splitter)
pipeline.add_component(name="prompt_builder", instance=prompt_builder)
pipeline.add_component(name="llm", instance=llm)

pipeline.connect("fetcher", "converter")
pipeline.connect("converter","preprocessor")
pipeline.connect("preprocessor", "prompt_builder")
pipeline.connect("prompt_builder", "llm")

query_dict ={
    "urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
    "query": "How do you build a custom component?"
}

This causes the error message below

Error message

ValueError                                Traceback (most recent call last)
[/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb) Cell 15 line 7
      [1](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=0) query_dict ={
      [2](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=1)     "urls": ["https://haystack.deepset.ai/blog/customizing-rag-to-summarize-hacker-news-posts-with-haystack2"],
      [3](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=2)     "query": "How do you build a custom component?"
      [4](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=3) }
----> [7](vscode-notebook-cell:/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/solution-Advent_of_Haystack_Pipeline_Connecting.ipynb#X20sZmlsZQ%3D%3D?line=6) result = pipeline.run(data={"fetcher": {"urls": query_dict["urls"]}, "prompt_builder": {"query": query_dict["query"]}})

File [~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85), in Pipeline.run(self, data, debug)
     [83](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:83) is_nested_component_input = all(isinstance(value, dict) for value in data.values())
     [84](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:84) if is_nested_component_input:
---> [85](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:85)     return self._run_internal(data=data, debug=debug)
     [86](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:86) else:
     [87](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:87)     # flat input, a dict where keys are input names and values are the corresponding values
     [88](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:88)     # we need to convert it to a nested dictionary of component inputs and then run the pipeline
     [89](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:89)     # just like in the previous case
     [90](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:90)     pipeline_inputs, unresolved_inputs = self._prepare_component_input_data(data)

File [~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:111](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:111), in Pipeline._run_internal(self, data, debug)
    [100](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:100) """
    [101](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:101) Runs the pipeline by invoking the underlying run to initiate the pipeline execution.
    [102](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:102) 
   (...)
    [108](https://file+.vscode-resource.vscode-cdn.net/Users/macpro/Documents/GitHub/Advent-of-Haystack/day1-rag-from-website/notebooks/~/anaconda3/envs/advent-haystack/lib/python3.10/site-packages/haystack/pipeline.py:108) :raises PipelineRuntimeError: if any of the components fail or return unexpected output.
...
- prompt_builder:
    - documents: Any
- llm:
    - generation_kwargs: Optional[Dict[str, Any]]

Expected behavior I understood the name given was for the purpose of drawing the pipeline - but it seems to cause issues if I change it. If the name is meant to be fixed, then having a variable called name is not needed.

Additional context

Advent of Haystack Day 2

To Reproduce Add and connect components with a name that differs from the instance name.

FAQ Check

System:

silvanocerza commented 8 months ago

@lfunderburk I can't seem to reproduce the issue. This should fail given the steps you provided to reproduce but it doesn't.

from haystack import Pipeline
from haystack.components.fetchers import LinkContentFetcher
from haystack.components.converters import HTMLToDocument

fetcher = LinkContentFetcher()
converter = HTMLToDocument()

pipeline = Pipeline()
pipeline.add_component(name="foo", instance=fetcher)
pipeline.add_component(name="bar", instance=converter)

pipeline.connect("foo", "bar")

I can't seem to pin point the exact issue here either. If you could provide a snippet to reproduce the issue reliably it would be great. Even a Colab is fine.