Add `ComponentAdapter` for enhanced component/pipeline flexibility in Haystack 2.x

Motivation:

In our current Haystack 2.x pipeline implementations, we often encounter situations where outputs from one component or pipeline need to be adapted, transformed, or otherwise bridged to serve as inputs to subsequent components. This typically involves writing custom "bridge code" that manually handles the extraction, transformation, and passing of data. While this approach works, it has several limitations:

Lack of Serialization: Custom bridging code is challenging to serialize and integrate into saved pipeline configurations, limiting our ability to share and reuse pipeline setups efficiently.
Increased Complexity: Each transition between components introduces additional code that must be maintained and understood, adding complexity to our pipeline configurations.
Inflexibility: Adapting pipelines to new requirements or different configurations often requires rewriting or significantly modifying the bridging code.

Proposal:

To address these challenges, let's consider the introduction of a hypothetical ComponentAdapter in 2.x. The ComponentAdapter will provide a declarative, configurable way to map and transform outputs from one component to suit the input requirements of another. This adapter will be serializable and flexible, facilitating easier pipeline configuration, maintenance, and sharing.

Benefits:

Streamlined Pipeline Configuration: Define input-output mappings in a clear, declarative manner, making pipelines easier to understand and configure.
Improved Maintainability: Reduce the amount of custom code needed for bridging component outputs and inputs, making pipelines easier to maintain and adapt.
Enhanced Flexibility and Reusability: Facilitate the reuse of components and pipelines across different contexts with minimal code changes.

To get a better feel for this component, consider the common scenario below encountered in non-trivial NLP tasks:

from haystack import Pipeline
from haystack.components import OpenAPIServiceToFunctions, GPTChatGenerator, OpenAPIServiceConnector
import requests
import json

# Initial indexing pipeline
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("spec_to_functions", OpenAPIServiceToFunctions())
results = indexing_pipeline.run(data={"sources": ["https://bit.ly/3tdRUM0"],
                                      "system_messages": [requests.get("https://bit.ly/48eN0ND").text]})

# Manual extraction and transformation
top_1_document = results["spec_to_functions"]["documents"][0]
openai_functions_definition = json.loads(top_1_document.content)
openapi_spec = top_1_document.meta["spec"]

# Second pipeline for service invocation
invoke_service_pipe = Pipeline()
invoke_service_pipe.add_component("functions_llm", GPTChatGenerator(model_name="gpt-3.5-turbo-0613"))
invoke_service_pipe.add_component("openapi_container", OpenAPIServiceConnector())
invoke_service_pipe.connect("functions_llm.replies", "openapi_container.messages")

# Run the second pipeline with manually transformed data
service_response = invoke_service_pipe.run(data={"messages": [ChatMessage.from_user(user_instruction)],
                                                 "generation_kwargs": {"functions": [openai_functions_definition]},
                                                 "service_openapi_spec": openapi_spec})

And after we introduce such a component:

from haystack import Pipeline
from haystack.components import OpenAPIServiceToFunctions, GPTChatGenerator, OpenAPIServiceConnector, ComponentAdapter
import requests
import json

# Define ComponentAdapter to automatically transform and pass data
outputs = [
    {
        "output": "{{ documents[0].meta['spec'] }}",
        "output_name": "service_openapi_spec",
        "output_type": Any,
    },
    {
        "output": "{{ json.loads(documents[0].content) }}",
        "output_name": "functions",
        "output_type": Any,
    },
]
adapter = ComponentAdapter(inputs=["documents", "runtime_or_additional_run_input"], outputs)

# Unified pipeline with ComponentAdapter
pipeline = Pipeline()
pipeline.add_component("spec_to_functions", OpenAPIServiceToFunctions())
pipeline.add_component("adapter", adapter)
pipeline.add_component("functions_llm", GPTChatGenerator(model_name="gpt-3.5-turbo-0613"))
pipeline.add_component("openapi_container", OpenAPIServiceConnector())

# Connect components using ComponentAdapter outputs
pipeline.connect("adapter.service_openapi_spec", "openapi_container.service_openapi_spec")
pipeline.connect("adapter.functions", "functions_llm.generation_kwargs")
pipeline.connect("functions_llm.replies", "openapi_container.messages")

# Run the pipeline with single data input
results = pipeline.run(data={"sources": ["https://bit.ly/3tdRUM0"],
                             "system_messages": [requests.get("https://bit.ly/48eN0ND").text]})

With ComponentAdapter, this manual process can be replaced by a configurable component.

Describe alternatives you've considered

I always resorted to manual "bridging code" or planned to write meta components in the future.

Additional context

The introduction of the ComponentAdapter provides an elegant solution for simpler cases of data transformation and bridging in our NLP pipelines. This development allows us to reserve the use of custom "meta components" for more complex scenarios where advanced data manipulation, intricate exception handling, and specialized processing are required. Previously, we planned to use these meta components even for relatively trivial bridging tasks, leading to potential overengineering and unnecessary complexity. Now, with ComponentAdapter, users have a more streamlined option for basic data transformation needs. This approach not only simplifies pipeline construction for straightforward tasks but also keeps the design cleaner and more focused. Meta components can then be exclusively utilized for tackling the more challenging aspects of NLP tasks, where their full capabilities are essential. This distinction in usage ensures that we apply the right tool for the right job, optimizing our development process and enhancing the overall efficiency and clarity of our pipeline architecture.

deepset-ai / haystack