deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.8k stars 1.84k forks source link

Add `ComponentAdapter` for enhanced component/pipeline flexibility in Haystack 2.x #6580

Closed vblagoje closed 6 months ago

vblagoje commented 9 months ago

Motivation:

In our current Haystack 2.x pipeline implementations, we often encounter situations where outputs from one component or pipeline need to be adapted, transformed, or otherwise bridged to serve as inputs to subsequent components. This typically involves writing custom "bridge code" that manually handles the extraction, transformation, and passing of data. While this approach works, it has several limitations:

Proposal:

To address these challenges, let's consider the introduction of a hypothetical ComponentAdapter in 2.x. The ComponentAdapter will provide a declarative, configurable way to map and transform outputs from one component to suit the input requirements of another. This adapter will be serializable and flexible, facilitating easier pipeline configuration, maintenance, and sharing.

Benefits:

To get a better feel for this component, consider the common scenario below encountered in non-trivial NLP tasks:

from haystack import Pipeline
from haystack.components import OpenAPIServiceToFunctions, GPTChatGenerator, OpenAPIServiceConnector
import requests
import json

# Initial indexing pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("spec_to_functions", OpenAPIServiceToFunctions())
results = indexing_pipeline.run(data={"sources": ["https://bit.ly/3tdRUM0"],
                                      "system_messages": [requests.get("https://bit.ly/48eN0ND").text]})

# Manual extraction and transformation
top_1_document = results["spec_to_functions"]["documents"][0]
openai_functions_definition = json.loads(top_1_document.content)
openapi_spec = top_1_document.meta["spec"]

# Second pipeline for service invocation
invoke_service_pipe = Pipeline()
invoke_service_pipe.add_component("functions_llm", GPTChatGenerator(model_name="gpt-3.5-turbo-0613"))
invoke_service_pipe.add_component("openapi_container", OpenAPIServiceConnector())
invoke_service_pipe.connect("functions_llm.replies", "openapi_container.messages")

# Run the second pipeline with manually transformed data
service_response = invoke_service_pipe.run(data={"messages": [ChatMessage.from_user(user_instruction)],
                                                 "generation_kwargs": {"functions": [openai_functions_definition]},
                                                 "service_openapi_spec": openapi_spec})

And after we introduce such a component:

from haystack import Pipeline
from haystack.components import OpenAPIServiceToFunctions, GPTChatGenerator, OpenAPIServiceConnector, ComponentAdapter
import requests
import json

# Define ComponentAdapter to automatically transform and pass data
outputs = [
    {
        "output": "{{ documents[0].meta['spec'] }}",
        "output_name": "service_openapi_spec",
        "output_type": Any,
    },
    {
        "output": "{{ json.loads(documents[0].content) }}",
        "output_name": "functions",
        "output_type": Any,
    },
]
adapter = ComponentAdapter(inputs=["documents", "runtime_or_additional_run_input"], outputs)

# Unified pipeline with ComponentAdapter
pipeline = Pipeline()
pipeline.add_component("spec_to_functions", OpenAPIServiceToFunctions())
pipeline.add_component("adapter", adapter)
pipeline.add_component("functions_llm", GPTChatGenerator(model_name="gpt-3.5-turbo-0613"))
pipeline.add_component("openapi_container", OpenAPIServiceConnector())

# Connect components using ComponentAdapter outputs
pipeline.connect("adapter.service_openapi_spec", "openapi_container.service_openapi_spec")
pipeline.connect("adapter.functions", "functions_llm.generation_kwargs")
pipeline.connect("functions_llm.replies", "openapi_container.messages")

# Run the pipeline with single data input
results = pipeline.run(data={"sources": ["https://bit.ly/3tdRUM0"],
                             "system_messages": [requests.get("https://bit.ly/48eN0ND").text]})

With ComponentAdapter, this manual process can be replaced by a configurable component.

Describe alternatives you've considered

I always resorted to manual "bridging code" or planned to write meta components in the future.

Additional context

The introduction of the ComponentAdapter provides an elegant solution for simpler cases of data transformation and bridging in our NLP pipelines. This development allows us to reserve the use of custom "meta components" for more complex scenarios where advanced data manipulation, intricate exception handling, and specialized processing are required. Previously, we planned to use these meta components even for relatively trivial bridging tasks, leading to potential overengineering and unnecessary complexity. Now, with ComponentAdapter, users have a more streamlined option for basic data transformation needs. This approach not only simplifies pipeline construction for straightforward tasks but also keeps the design cleaner and more focused. Meta components can then be exclusively utilized for tackling the more challenging aspects of NLP tasks, where their full capabilities are essential. This distinction in usage ensures that we apply the right tool for the right job, optimizing our development process and enhancing the overall efficiency and clarity of our pipeline architecture.

anakin87 commented 6 months ago

Seems like a duplicate of #6938. Feel free to reopen if I am wrong.

vblagoje commented 6 months ago

No, you are right @anakin87 - completed!