dezoito / ollama-grid-search

A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.
MIT License
507 stars 31 forks source link

A/B test user prompts #18

Closed calebsheridan closed 6 months ago

calebsheridan commented 7 months ago

Add ability for A/B testing user prompts.

Notes:

dezoito commented 7 months ago

Hi @calebsheridan.

Can you provide a description of what the PR does?

calebsheridan commented 7 months ago

@dezoito added description

calebsheridan commented 7 months ago
Screenshot 2024-04-30 at 10 49 44
Screenshot 2024-04-30 at 10 49 54
Screenshot 2024-04-30 at 10 50 12
dezoito commented 7 months ago

@calebsheridan , first of all, thank you again for the PRs and the effort and detail you've put into the updates.

I really liked the way you solved the "multi-prompt problem" in a way that keeps the interface intuitive and clean and would like to discuss some issues before merging, if you are OK with it:

1) System Prompt

Merge caused system prompt to no longer work with prompt dialogue (temporarily disabled).

Do you see any way that could be added back?

Some people write complex system prompts, and I wish we could retain the ability to open a large "editor" so that they don't have to switch to a different program to make changes comfortably.

2) Displaying prompts for each iteration

Another issue is how to display the prompts used for each result

Consider the current screenshot below:

image

I feel like it would be interesting to retail formatting (like line breaks), when displaying the prompts, and the current order and colors make it difficult to differentiate from the rest of the parameters.

When inspecting past experiments, this is a little more clear (although I admit I am not preserving the line breaks yet):

image

I'd like your opinion on two possible approaches:

2.1- Move the prompt to the bottom of the inference parameters, and maybe add some spacing/different color to differentiate it from the other parameters.

OR

2.2- Put the prompt in an "accordion" at the bottom of the inference parameters, and use just the first "N" characters as the accordion trigger.

I feel like option 2.2 would work better for large prompts and, In both options and in the experiment results, line breaks should be preserved.

3) Display prompts when inspecting past experiments.

Currently, since all inferences use the same prompt, the ExperimentDataDialog component just uses the one stored with the first inference:


                    <div className="p-1 font-mono text-gray-700 dark:text-gray-400">
                      {data.inferences[0].parameters.system_prompt}
                    </div>
                    <div className="p-1 font-mono text-gray-700 dark:text-gray-400">
                      {data.inferences[0].parameters.prompt}
                    </div>

I feel like we could keep this logic for the System Prompt, but each iteration should display the corresponding prompt somehow (possibly using the same component mentioned in the previous point.


I'm willing to work on points 2 and 3, but it might take some time until I can touch this.

Please let me know how you feel about these observations.

calebsheridan commented 7 months ago
  1. OK
  2. OK, we can try both
  3. OK

At some point, it would be nice to test multiple system prompts also.

For prompts in general, I felt that a nice extension to this PR would be a local library of prompts where each prompt can be selected/deselected instead of simply added or removed (in other words, similar to how model selection works now). See https://github.com/dezoito/ollama-grid-search/issues/20

dezoito commented 7 months ago
  1. OK
    1. OK, we can try both
    2. OK

Thank you!

At some point, it would be nice to test multiple system prompts also.

For prompts in general, I felt that a nice extension to this PR would be a local library of prompts where each prompt can be selected/deselected instead of simply added or removed (in other words, similar to how model selection works now). See #20

I agree on both points... going to continue this discussion in #20 .

dezoito commented 6 months ago

Merged to main. Thank you, @calebsheridan!

I'll update the README to highlight the new features and try to work on the remaining updates, then generate a new release.