FraserLee commented 1 year ago

Prompt Engineering Workflow

As I had said previously, our prompt is a function - not a template. Certain parameters like what goes in a system vs a user prompt, how we delineate sources, where to break messages, how to handle history, etc - it's too much weight to be cleanly represented with anything but code. That said, this process should be fairly simple and straightforwards even to non-technical people:

Edit the function construct_prompt in api/chat.py
cd api
run pipenv run python3 prompteng/prompteng.py
open the google sheet
click file->import. Click upload. Drag and drop the file api/prompteng/data/answers.csv
press cmd + a. Click format->wrapping->wrap. Adjust column widths to preference.

Possible future improvements

Copying over CSV manually is burdensome, but my efforts to try to get google sheets api didn't work out. Setting this up backed by our own DB could be nice.
We could throw GPT-4 in the loop to evaluate answers on some simple criteria (eg "Does the following answer contain any information not in the sources? Write about three sentences of explanation, then on a new line answer either YES or NO").
Although right now the prompt is loose enough that we need to keep it as a function, there might come a point where we nail down outer structure (eg 1 system message, 1 user message, system for sources in a certain way, final message of X, etc). We're not there yet, but when we reach this point we could have the instruction text parts of that stored in the spreadsheet and more easily modified and tracked.
I initially had the output format closer to what was described in #6 with the sources in separate columns to the right of the data, but found that to be a lot less readable than having them in middle lines between the answers. Still not sure if there might be a better format possible.
solving #46 would make the output contain less unused sources

Would love some feedback around this one before I merge!

FraserLee commented 1 year ago

https://docs.google.com/spreadsheets/d/1kUb6Mtn1fXLd_iRHEOOnFjpyv0TA5I2Q-R1PtAG7ARw/edit?usp=sharing

cvarrichio commented 1 year ago

Just so that we don't duplicate, I'm nearly done with https://github.com/StampyAI/stampy-chat/issues/6. I should push something within a few days.

ccstan99 commented 1 year ago

@cvarrichio This PR mostly sets up a means to easily benchmarked / test new prompts and generates the output to a CSV file. Fraser's looking for feedback on it. If you're able to play around with this new setup, it might speed up your own testing and refinement process as well.

cvarrichio commented 1 year ago

I'm pretty sure this is exactly what I've been working on. Pausing for now.

ccstan99 commented 1 year ago

@cvarrichio Shoot, you're right! I remember that you wanted to work on prompt engineering and now I remember mentioning that having the testing in place would make that work faster. I see that I even wrote it down on our planning docs, but Fraser wasn't as the meeting so probably didn't catch that. So sorry for the confusion.

FYI, I added some specific things to try for prompt engineering #3, which hopefully be much easier to try with these changes.

FraserLee commented 1 year ago

Ah sorry for stepping on your toes there, I thought you were more working on #2. If you have any improvements to my version feel free to open a second PR on top!

StampyAI / stampy-chat

Prompt Eng #50

Prompt Engineering Workflow

Possible future improvements