eidolon-ai / eidolon

The first AI Agent Server, Eidolon is a pluggable Agent SDK and enterprise ready, deployment server for Agentic applications
https://www.eidolonai.com/
Apache License 2.0
286 stars 31 forks source link

Prompt Engineering Debugger #846

Open LukeLalor opened 1 month ago

LukeLalor commented 1 month ago

Problem

prompt engineering is slow, and in a multi-agent system is very hard. You need to experiment with a message arbitrarily deep in the stack quickly. The current replay feature is helpful, but does not work well with docker / k8s deployments, is hard to correlate llm requests (ie, find where you are), and is only "point in time experimentation" rather than modifying the rest of the request.

Out of Scope

This is supposed to be a prompt engineering tool. Developers should use code breakpoints when building custom agent templates or tools.

Proposal

Introduce a debugger into the eidolon dev tools component. Users can set break points on agents / tools and step through them using the debugger as they are used to in their IDE. Furthermore, they can review a conversation after-the-fact as well, inspect llm requests, and experiment with executing requests with alternative parameters.

Debug Pane

debugger controls, stack, evaluation option(s), variables, save (propagate variable changes to the system)

#################################################################################
#                                       #
#   Stop * Resume * Step Over * Step Into * Step Out Of         #
#   -----------------------------------------------------------------   #
#   Stack               |   evaluate: > execute_llm     #
#   worker_agent.execute_llm    |   -------------------------   #
#   manager_agent.execute_tool_call |   messages = [...]        #
#   ...             |   tools = [...]           #
#                   |   output_format = {...}       #
#                   |   [Reset][Save]           #
#                                       #
#################################################################################

Breakpoint Plane

#### Breakpoint Controlls ####
[ ] Disable All
Agents:
    All
        [ ] execute_action
        [ ] execute_llm
    chatbot_agent
        [ ] execute_action
        [ ] execute_llm
    qa_agent
        ...
Logic Units
    All
        [ ] execute_tool_call
    ...

Technical Notes

Dev Plan

To reduce scope we can start with part 2: re-executing llm / tool requests. This allows us to introduce most of the concepts (frames, stack, variables, execution) without needing to worry about enabled/disabled breakpoints, or debugger controls (step into, over) rest api concept or handle resumed execution.

LukeLalor commented 1 month ago

Questions:

When brainstorming this feature, we talked about several breakpoint locations

I think we might want to narrow it down all the way to just pre-llm execution. At the end of the day, this is a prompt engineering tool, not a debugging tool, and we need to focus accordingly.

flynntsang commented 1 month ago

Reference: Watch the video at https://www.braintrust.dev/blog/announcing-series-a for an example of how they score and evaluate prompts. @parmi02 @LukeLalor