Open ju-gu opened 5 days ago
I briefly investigated by bisecting. The last commit this Pipeline works is https://github.com/deepset-ai/haystack/commit/badb05b3abb09fa190049b31b975365d69dd0112, the bug seems introduced with the commit right after https://github.com/deepset-ai/haystack/commit/83d3970405085aae5b22dc0f715398077f1f71fc.
Seems like the changes to PromptBuilder
in #7655 surfaced this bug.
I'm still not sure what's the actual cause and will keep investigating.
Temporary workdaournd is adding required_variables
in PromptBuilder
s as done below makes the Pipeline run as expected.
prompt_builder2 = PromptBuilder(template=prompt_template2, required_variables=["documents", "question"])
prompt_builder3 = PromptBuilder(template=prompt_template3, required_variables=["replies"])
Another solution could be changing the order the PromptBuilder
s are added in the Pipeline:
pipeline.add_component(name="prompt_builder1", instance=prompt_builder1)
pipeline.add_component(name="prompt_builder3", instance=prompt_builder3)
pipeline.add_component(name="prompt_builder2", instance=prompt_builder2)
This problems is caused by a combination of some things. The way we decide which Component to run next, the fact that Components addition order influences the run order and how we treat Components that have only inputs with defaults.
Ideally the fix would change how we decide which Components to run that is independent from the other two factors. And also doesn't break existing use cases.
Not sure how easy that will be. 😕
Describe the bug When having a more complex pipeline the run order fails by not being able to identify the first node and then setting the "documents" and query input to empty strings. Nodes are executed multiple times overwriting these wrong intermediary outputs again during run time.
The point of failure is in the
_component_has_enough_inputs_to_run
method of the pipeline.py, as expected inputs for prompt_builder1 arequestion
,template
andtemplate_variables
and the input parameters are justquestion
, resulting in the function returningfalse
. Later a different component is being executed with "default" values, which are all None / empty strings. Though the template is being parsed already upon instantiation to the prompt builder and thetemplate_variables
just include thequestion
parsed in the run method. So no mismatch between expected and input parameters should be there.Parsing
template
andtemplate_variables
in the run method resolves this issue (shouldn't be needed though).Output of the sample pipeline (nodes are executed multiple times and starting with the second llm):
To Reproduce
Run this pipeline and check the execution order
test_data.zip
FAQ Check