Open circbuf255 opened 3 weeks ago
Have you tested these with OpenAI and Claude models? How does it perform?
I did some testing with gpt-4o and llama3:instruct. The difference for GPT due to the re-ordering was fairly minimal. In one instance, it was slightly better, in the other, slightly worse. On llama3, I had a few tests that were markedly better, and then a couple where it didn't seem to make a difference.
These were a combination of my own patterns and the default ones, and this was all very "let's take a quick look". I think it's worth exploring and would be up for some more testing, but I haven't thought of a good approach to this yet. Maybe the OP has some ideas, or maybe there's a gamut of tests that are used for development?
What do you need?
After some experimentation, q8 llama3 and mistral run locally are performing much better after adjusting the prompt order. I propose restructuring the patterns so that the "# INPUT" text comes before output instructions. INPUT before OUTPUT INSTRUCTIONS is producing much more coherent responses.
As an example, let's look at extract_wisdom:
Proposed prompt order
INPUT --put your YouTube transcript here-- IDENTITY and PURPOSE STEPS OUTPUT INSTRUCTIONS OUTPUT
Current prompt order
IDENTITY and PURPOSE STEPS OUTPUT INSTRUCTIONS INPUT
Models tested: llama3:8b-instruct-q8_0 mistral:7b-instruct-v0.2-q8_0