Open sestinj opened 1 year ago
Hey @sestinj , do you have any framework in mind for continuously testing and optimizing such prompts, as well as providing some kind of benchmarking? Starting by establishing a minimally viable set of rules would help iterate and automate the process. This sounds like an interesting topic for me, and I'm going to explore it from my side too, and come back here with some ideas.
@my8bit I see I missed this, but we don't currently have any super rigorous process. Was just a lot of iteration early on with GPT-4. And for OS models we've just found one decent pattern and stuck to it. I think you've correctly identified this as an area open to major improvement!
Edit: at times, we've set up a small number of tests. I believe there was a good dataset of broken and resolved versions of certain competitive programming problems. We would run through about 50 of these and see how it performed, but this is now a slightly different task from the edit prompts we use today
Since Continue works with any model, there's a ton of prompt engineering needed to optimize for each of them.
The most important area for work is the /edit slash command. GPT-4 is able to handle a very complicated prompt that we give it here, but smaller open-source models often struggle and need a simpler prompt. The goal is to reliably convert
(previous code, user instructions) --> new code
without the model outputting any English.Prompt templates for open-source models (we don't give them the whole complicated prompt that GPT-4 is able to handle) can be found here: https://github.com/continuedev/continue/blob/preview/core/llm/templates/edit.ts
There are also a handful of utilities for cleaning up the response that we use for cmd+shift+L ("quick edit"), which will eventually be merged somewhat with the /edit prompt: https://github.com/continuedev/continue/blob/preview/core/commands/slash/verticalEdit.ts