DAGWorks-Inc / burr

Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
https://burr.dagworks.io
BSD 3-Clause Clear License
1.29k stars 73 forks source link

1-to-many transitions #359

Open zilto opened 2 months ago

zilto commented 2 months ago

Currently

Below is a valid Burr graph definition to "get a user input, and select the most appropriate action out of 3"

graph = (
  GraphBuilder()
  .with_actions(
    process_user_input,
    decide_next_action,
    generate_text,
    generate_image,
    ask_for_details,
  )
  .with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", "generate_text"),
      ("decide_next_action", "generate_image", expr("mode==generate_image")),
      ("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
  )
  .build()
)

Notes:

Desired solution

The main benefit of the above is that everything is explicit. But it can also be less ergonomic, add complexity to defining transitions, and be inefficient when computing transitions is expensive.

Consider the following API

ApplicationBuilder()
  .with_actions(
    process_user_input,
    generate_text,
    generate_image,
    ask_user_for_details,
  )
  .with_transitions(
    ("process_user_input", "decide_next_action"),
    (
      "decide_next_action",
      ["generate_text", "generate_image", "ask_user_for_details"],
      OneToManyCondition(...)   # TODO
    ),
  )
  .with_entrypoint("process_user_input")
  .build()
)

Note:

Use case

The popular use case I have in mind is "use an LLM to decide the next node". Given the Graph building process, it would be possible to dynamically create a model of "available next actions". Here's a sketch using instructor, which has better guarantees regarding structured LLM outputs (for OpenAI at least)

ApplicationBuilder()
  .with_actions(
    process_user_input,
    generate_text,
    generate_image,
    ask_user_for_details,
  )
  .with_transitions(
    ("process_user_input", "decide_next_action"),
    (
      "decide_next_action",
      ["generate_text", "generate_image", "ask_user_for_details"],
      LLMDecider()
    ),
  )
  .with_entrypoint("process_user_input")
  .build()
)
def create_decision_model(tos: list[FunctionBasedAction]):
    next_action_names = []
    description = ""
    for to in tos:
        if not to.__doc__:
            raise ValueError(f"LLMDecider: {to.name} needs to have a non-empty docstring.")

        next_action_names.append(to.name)
        description += f"{to.name}\n{to.__doc__}\n\n"

    return create_model(
        "next_action",
        next_action=(
            Literal[tuple(next_action_names)],
            Field(description="AVAILABLE ACTIONS\n\n"+next_action_descriptions)
        )
    )

 def _llm_resolver(state: State, llm_client, response_model) -> str:
    user_query = state["user_query"]

    next_action = llm_client.chat.completions.create(
        model="gpt-4o.mini",
        response_model=response_model,
        messages=[
            {"role": "system", "content": "You are an automated agent and you need to decide the next action to take to fulfill the user query."},
            {"role": "user", "content": user_query}
        ]
    )

    return next_action

 # apply something along those lines
 condition = Condition(keys=["user_query"], resolver=partial(_llm_resolver, llm_client=...,  response_model=create_decision_model)
elijahbenizzy commented 2 months ago

Yep, this is clean. OneToMany -> Select maybe?

Current transition = if/else Select = switch statement

skrawcz commented 2 months ago

and be inefficient when computing transitions is expensive.

Can you clarify? I don't understand that comment.

No longer requires a "decide node";

I'm not sure I follow this, you still have a decide_next_action in your example? Isn't that the same as before?

LLMDecider()

I don't follow your code above, where is this defined?

Here's a sketch using instructor, which has better guarantees regarding structured LLM outputs (for OpenAI at least)

instructor is just an implementation to get the LLM to output something structured. I'm not sure how it's relevant here? You can use that same instructor call in the body of an action.

If I'm understanding correctly, you're saying that this explicitness / verboseness:

.with_transitions(
      ("process_user_input", "decide_next_action"),
      ("decide_next_action", "generate_text"),
      ("decide_next_action", "generate_image", expr("mode==generate_image")),
      ("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
  )

is slowing you down while iterating?

It's obviously a trade-off, to me the following requires me to do more work to understand how things work:

.with_transitions(
   ("process_user_input", "decide_next_action"),
   (
     "decide_next_action",
     ["generate_text", "generate_image", "ask_user_for_details"],
     LLMDecider()
   ),
 )

Otherwise the impacts on the UI, and debugging need to be considered. Since effectively you're pushing state computation to an edge...


IIUC though, it sounds like the main pain is having to update the edge -- so we could enable a new expression/construct instead, e.g.

 .with_transitions(
       ("process_user_input", "decide_next_action"),
       ("decide_next_action", 
           ["generate_text", "generate_image", "ask_for_details"], 
               switch("mode=={{action.name}}"),
         ),
   )
elijahbenizzy commented 2 months ago

So I think the switch statement is nice as a concept, but I'd decouple it from the LLM stuff. A few things in the example above:

  1. burying the heavy-lifting in the edge (generally bad practice in Burr)
  2. Ensuring readability -- switch, or the OneToMany condition -- this helps + makes it concise
  3. Dynamic edges -- hard to follow the code on LLMDecider -- that seems to me to add a level of indirection that's not necessarily needed

But the switch statement is the main purpose, which I like. This depends on how common this is -- I could see it useful for tool calling, but I'm not sure how generalizable this is? TBH I'm not sure the mode-switching is actually a great first example, cause ChatGPT just does that for you.

zilto commented 2 months ago

@skrawcz I used the LLMDecider idea as an example of a common use case, but it's not the point of the feature.

The main point is

If I'm understanding correctly, you're saying that this explicitness / verboseness: is slowing you down while iterating?

.with_transitions(
("process_user_input", "decide_next_action"),
("decide_next_action", "generate_text"),
("decide_next_action", "generate_image", expr("mode==generate_image")),
("decide_next_action", "ask_for_details", expr("mode==ask_user_for_details")),
)
  1. For one, the number of conditions here is small. The current interface provides no guarantee that all decide_next_action -> ... are sorted and placed together. Also, the users have no easy way to do so. Whereas the following makes it very obvious where to edit code
(
  "decide_next_action", 
   ("generate_text", "generate_image", "ask_for_details"), 
   Select(...),
),

Also, we already support "many-to-one" definitions, so that doesn't seem outlandish

 (
     ("generate_text", "generate_image", "ask_user_for_details"),
     "send_response"
 ),
  1. Relying on the order of statements to evaluate conditions is brittle and not obvious to the user. If the app behaves oddly, they have to know that ordering matters. The good-faith user that wants to sort the above messy code (argument 1.) will break its own app by sorting the code. It creates a "don't touch if it aint broken" and reduces maintainability

and be inefficient when computing transitions is expensive.

  1. If a node has 1 -> n transitions, current sequential checks have to go through 1 to n checks. Meanwhile Select has to go through a single check to decide between n actions