PrefectHQ / ControlFlow

🦾 Take control of your AI agents
https://controlflow.ai
Apache License 2.0
523 stars 35 forks source link

Hooks for user input? #219

Open dexhorthy opened 2 months ago

dexhorthy commented 2 months ago

Enhancement Description

I'm look to understand the best way to bring the User into a conversation where the interaction is not a CLI / CLI tool. I can think of a few workaround-ish ideas that might work:

  1. use prefect workflow pause/resume to go get the user input and resume with their input
  2. add a tool call like tell_user or ask_user_for_clarification that handles the IO via a websocket or something
  3. set user_input=True on a task, capture/forward stdin/stdout 😬

Use Case

Building autonomous agents for data engineering and data product management -

The interaction paradigms I'd like to be able to support include web-app chat via websockets or other async layer, in addition to more "outer loop" type channels like email, slack, sms, etc. For example, an agent might discover something and want to alert a user, or might complete a long-running task for a user and want to get the user's input.

These sorts of workflows fit nicely into a D(A)G-y sort of state machine that prefect enables, but I'm trying to wrap my head around the best way to fit together these sorts of async and multi-player workflows, or even just some workarounds / patterns that have worked well for applications that have access to LLMs

Proposed Implementation

# open to brainstorming but nothing off the top of my head
aaazzam commented 2 months ago

Hey @dexhorthy.

A resounding :hell_yeah: from us over here. You've nailed the two blessed ways at the moment: pause/resume + make a tool. The former is nice because it lets you sleep agents until they're ready, but the latter is more ergonomic IMO and let's you lean into writing cleaner prompts.

Would love to find a way to make these two easier, where you can sleep an agent (sleeper agent!??!?) workflow until its tool result comes back in.

dexhorthy commented 1 month ago

one additional bit of detail here as we're thinking through this - pause/suspend might work okay if you get a response in under an hour (after which the timeout hits), but the natural way I had implemented this was

flow -> AI task -> tool "get_confirmation_in_slack" -> slack -> set Variable mapping slack_msg_id to flow_run_id, then pause flow with wait_for_input

slack webhook -> fastapi -> lookup flow_run_id with inbound slack_msg_id of the slack thread parent, send_input to the paused flow

the issue there is that you end up with an error because you can't pause a flow if there's a TaskRunContext

2024-07-24 13:08:12,370 - prefect.task_runs - ERROR - Finished in state Failed('Task run encountered an exception RuntimeError: Cannot pause task runs.')

I am currently trying to to rework this a bit

flow -> AI task -> slack, set msg_id flow -> pause

slack webhook -> fastapi -> lookup flow_run_id as before, resume the paused flow

but the magic of "use the llm to do the reasoning" is a little lost in this case. can't really explain why I want this, maybe the "tool method" is just as you said, "more ergonomic". It eliminates a lot of cognitive overhead in the implementation to tell an LLM "here's a tool that you can use to ask for confirmation/input from a backend user" where the tool itself returns the response from the user, and the LLM can even make decisions about when to ask for approval vs. skip that step based on completeness of context, etc.

Among other things, doing the pause resume at the flow level, outside the LLM, bubbles implementation details down from inside the tool all the way up to the flow. Before, one or more "confirmation" implementations could be neatly bundled up in a modular tool and glued arbitrarily to the webhook receiver endpoint (e.g. before exploring flow pause/resume, we did this with an in-memory Queue that the flow, tasks, and llm knew nothing about)