deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.66k stars 1.83k forks source link

ConditionalRouter output type is not working anymore #8161

Closed gustavo-has-stone closed 1 week ago

gustavo-has-stone commented 1 month ago

Describe the bug We tried to update the Haystack version to 2.3.1 due to the security update and our code stopped working. We identified that the ConditionalRouter is returning a string instead of the type assigned in output_type

Error message We are getting errors related to the fact that a string is now being returned. Here is an example of the error when running the provided code:

AttributeError: 'str' object has no attribute 'content'

Expected behavior We expected that the output_type provided in the route would be returned

Additional context The problem is related to version 2.3.1. Our provided example works on 2.3.0

To Reproduce The following code works on haystack 2.3.0 but it does not work on haystack 2.3.1:

from haystack.components.routers import ConditionalRouter
from haystack.dataclasses import ChatMessage, ChatRole

routes = [
    {
        "condition": "{{streams|length > 2}}",
        "output": "{{message}}",
        "output_name": "long_streams_message",
        "output_type": ChatMessage,
    },
    {
        "condition": "{{streams|length <= 2}}",
        "output": "{{message}}",
        "output_name": "short_streams_message",
        "output_type": ChatMessage,
    },
]

message = ChatMessage(content='Hi', role=ChatRole('user'), name='', meta={})
print(message.content)

router = ConditionalRouter(routes)

kwargs = {"streams": [1, 2, 3], "message": message}
result = router.run(**kwargs)
print(result['long_streams_message'].content)

FAQ Check

System:

silvanocerza commented 1 month ago

That's expected.

Because of the changes to fix the security issue we had to make this breaking change. The ConditionalRouter can only return the following Python literal types now: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, None and Ellipsis. Any other will be returned as a string.

I would suggest creating a custom Component to cover those cases where the ConditionalRouter can't be used as before.

Something like this should do the trick:

from typing import List

from haystack import component
from haystack.dataclasses import ChatMessage

@component
class Decision:
    @component.output_types(short_streams_message=ChatMessage, long_streams_message=ChatMessage)
    def run(self, streams: List[str], message: ChatMessage):
        if len(streams) > 2:
            return {"long_streams_message": message}
        return {"short_streams_message": message}
gustavo-has-stone commented 1 month ago

I see. So the parameter output_type will not work anymore?

We did not expect to see a breaking change with a patch version change. Should we worry about those updates too?

silvanocerza commented 1 month ago

output_type has never been necessary to convert the type, it's only used to set the correct output type for the Component. Even before this change you could put any type in there but it wouldn't guarantee that the Component will return that type when calling run. That's only necessary when calling Pipeline.connect() to validate the connection.

We decided to release this security change as a patch release even if it's a breaking change as it's a pretty critical, it could lead to remote code execution.

mathislucka commented 1 month ago

Perhaps we could add an option to the ConditionalRouter so that it can route inputs to the selected route's output without rendering a jinja2 template?

I also had this case, where I wanted to route a list of Documents depending on a reply generated from the LLM. I don't have to go through jinja to route these documents because I don't want to change anything about them.

So something like:

routes = [
{"condition": "{{...}}", output: "documents", output_name: "a_documents", "output_type": List[Document], "use_literal_output": True}
]
router = ConditionalRouter(routes=routes)

router.run(documents=[Document(content="hey")])

# returns the document passed in unchanged

https://github.com/deepset-ai/haystack/blob/e17d0c41926849975166e70ec46411f402c8098e/haystack/components/routers/conditional_router.py#L206

This part would need to be changed to something like:

if route.get("use_literal_output"):
    output = kwargs[route["output"]]
else:
    ...

Would that solve your use case too @gustavo-has-stone ?

gustavo-has-stone commented 1 month ago

In our scenario, we were routing a variable of a custom type (similar to ChatMessage) based on the value of an attribute of this variable. I did not fully understand your example. Could you provide more details?

alexsniffin commented 1 month ago

Hi, I ran into this problem too when upgrading to 2.3.1. I use a custom dataclass type for my pipeline which isn't working as expected anymore. Is there a suggested fix for this without creating a custom router?

silvanocerza commented 1 month ago

@gustavo-has-stone @alexsniffin

We discussed a bit internally and decided to give the possibility to enable the previous behaviour by creating ConditionalRouter and OutputAdapter with unsafe=True.

I just created a PR to bring that change in and we'll release it on the upcoming 2.4.0 some time next week. For the time being, and if you consider it safe, you can still rely on the previous behaviour by using 2.3.0 instead of the latest 2.3.1.

gustavo-has-stone commented 1 month ago

I don't think that using an unsafe version is a good solution for us. We will try to find an alternative solution for our use case

dmvieira commented 1 month ago

Hey @silvanocerza Idk if it`s a good solution because you are publishing a version that the only way to use previews behavior is forcing unsafe. It forces people to open security breaches when need a feature.

An approach that could help us is using Haystack ChatMessage as Router output. Do you know if is it possible?

silvanocerza commented 1 month ago

I don't think that using an unsafe version is a good solution for us. We will try to find an alternative solution for our use case

It's unsafe only if the Jinja templates used are users inputs too. If you know the templates it's completely safe to use.

Hey @silvanocerza Idk if it`s a good solution because you are publishing a version that the only way to use previews behavior is forcing unsafe. It forces people to open security breaches when need a feature.

An approach that could help us is using Haystack ChatMessage as Router output. Do you know if is it possible?

Setting unsafe to True doesn't automatically open you to security breaches. It enables a behaviour that is considered unsafe in certain circumstances, that is when ConditionalRouter or OutputAdapter use templates that can be considered users input.

If you know what you're doing and know you're not using users inputs as Jinja template that is completely fine to use.

gustavo-has-stone commented 1 month ago

Let me see if I understand. If we try to route based on some controlled condition (e.g., message length), we will be fine. However, if we try to use the content of a message for conditional routing, we could be in trouble. Is that right?

silvanocerza commented 1 month ago

Let me see if I understand. If we try to route based on some controlled condition (e.g., message length), we will be fine. However, if we try to use the content of a message for conditional routing, we could be in trouble. Is that right?

No. You are vulnerable if you let your users write their own templates from scratch.

A simple example:

from typing import List
from haystack.components.routers import ConditionalRouter

user_input = input("Write a Jinja template:")

routes = [
    {
        "condition": user_input,
        "output": "{{streams}}",
        "output_name": "enough_streams",
        "output_type": List[int],
    },
    {
        "condition": "{{streams|length <= 2}}",
        "output": "{{streams}}",
        "output_name": "insufficient_streams",
        "output_type": List[int],
    },
]
router = ConditionalRouter(routes)
...
router.run(whatever_input)
dmvieira commented 1 month ago

Thank you @silvanocerza . It sounds like a plan! 😄