deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.93k stars 1.85k forks source link

Component with Variadic input is not run if some of its inputs are not sent #8341

Closed silvanocerza closed 2 weeks ago

silvanocerza commented 2 weeks ago

Describe the bug

Pipeline.run() doesn't run expected Component with Variadic input if some of its senders do not send it any input.

Expected behavior

Pipeline.run() runs Component with Variadic input as expected.

Additional context

This has been reported by a user on Discord in this thread.

To Reproduce

The below snippet reproduces the issue, both asserts fails even though it shouldn't.

In both cases below the joiner doesn't run.

That's unexpected and must be fixed.

from typing import List

from haystack import Document, Pipeline, component
from haystack.components.joiners import DocumentJoiner

document_joiner = DocumentJoiner()

@component
class ConditionalDocumentCreator:
    def __init__(self, content: str):
        self._content = content

    @component.output_types(documents=List[Document], noop=None)
    def run(self, create_document: bool = False):
        if create_document:
            return {"documents": [Document(id=self._content, content=self._content)]}
        return {"noop": None}

pipeline = Pipeline()
pipeline.add_component("first_creator", ConditionalDocumentCreator(content="First document"))
pipeline.add_component("second_creator", ConditionalDocumentCreator(content="Second document"))
pipeline.add_component("third_creator", ConditionalDocumentCreator(content="Third document"))
pipeline.add_component("joiner", document_joiner)

pipeline.connect("first_creator.documents", "joiner.documents")
pipeline.connect("second_creator.documents", "joiner.documents")
pipeline.connect("third_creator.documents", "joiner.documents")

output = pipeline.run(data={"first_creator": {"create_document": True}, "third_creator": {"create_document": True}})

print(output)
assert output == {
    "second_creator": {"noop": None},
    "joiner": {
        "documents": [
            Document(id="First document", content="First document"),
            Document(id="Third document", content="Third document"),
        ]
    },
}

output = pipeline.run(data={"first_creator": {"create_document": True}, "second_creator": {"create_document": True}})
print(output)
assert output == {
    "third_creator": {"noop": None},
    "joiner": {
        "documents": [
            Document(id="First document", content="First document"),
            Document(id="Second document", content="Second document"),
        ]
    },
}
silvanocerza commented 2 weeks ago

In the subgraph rework discussed in #8339 the above snippet works in part.

If only first_creator and third_creator sends it some inputs it runs. if only first_creator and second_creator sends it some inputs it doesn't run.

A separate fix might be required for the rework.