Open LukeLalor opened 1 month ago
Questions:
When brainstorming this feature, we talked about several breakpoint locations
I think we might want to narrow it down all the way to just pre-llm execution. At the end of the day, this is a prompt engineering tool, not a debugging tool, and we need to focus accordingly.
Reference: Watch the video at https://www.braintrust.dev/blog/announcing-series-a for an example of how they score and evaluate prompts. @parmi02 @LukeLalor
Problem
prompt engineering is slow, and in a multi-agent system is very hard. You need to experiment with a message arbitrarily deep in the stack quickly. The current replay feature is helpful, but does not work well with docker / k8s deployments, is hard to correlate llm requests (ie, find where you are), and is only "point in time experimentation" rather than modifying the rest of the request.
Out of Scope
This is supposed to be a prompt engineering tool. Developers should use code breakpoints when building custom agent templates or tools.
Proposal
Introduce a debugger into the eidolon dev tools component. Users can set break points on agents / tools and step through them using the debugger as they are used to in their IDE. Furthermore, they can review a conversation after-the-fact as well, inspect llm requests, and experiment with executing requests with alternative parameters.
Debug Pane
debugger controls, stack, evaluation option(s), variables, save (propagate variable changes to the system)
Breakpoint Plane
Technical Notes
Dev Plan
To reduce scope we can start with part 2: re-executing llm / tool requests. This allows us to introduce most of the concepts (frames, stack, variables, execution) without needing to worry about enabled/disabled breakpoints, or debugger controls (step into, over) rest api concept or handle resumed execution.