dezoito / ollama-grid-search

A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.
MIT License
511 stars 31 forks source link

feat: Hide model names #15

Closed notasquid1938 closed 6 months ago

notasquid1938 commented 7 months ago

The option to hide model names can help eliminate personal bias especially when comparing different models. Also, is there any plan to use this to make elo comparisons like a locally running personal https://chat.lmsys.org?

dezoito commented 7 months ago

I like the idea of allowing model names to be hidden - at least in the immediate results view.

On the ELO comparisons, could you please expand?

How do you see that working in terms of the UI?

Would you track evals based on the the model's name only, or also for the combination of parameters used in each experiment iteration?

notasquid1938 commented 7 months ago

This would require significant effort, so I'm mostly writing this out to show how I envision it:

Each configuration of model and parameters would be treated as a different competitor.

They are all given an initial elo. As you pick one configuration and model over another their elo's are adjusted after each vote. At the end a leaderboard would be displayed showing the end elo of each model and configuration.

This would work best if multiple prompts could be fed in similar to how the https://chat.lmsys.org arena works. For instance, I would test between 3 models each with 2 different parameters each for 6 competitors. I write out a list of prompts, the number required should increase as the competitors increase. The program will ask me to vote between the best response of two random competitors. Then, it repeats this process for all my prompts randomly selecting two competitors every time. Given 20-30 prompts a clear trend should appear in the elo of the 6 competitors, illustrating which model and parameters provides the preferred response as well as how far apart each model and parameter is.

dezoito commented 7 months ago

@notasquid1938 ,

Thank you for the clarification.

Although I can see some similarities, I agree that it would take considerable effort (especially since we don't have a database layer in place yet), and diverge from what I had in mind when I started the project.

It's an interesting idea and I'd be happy if someone could make a fork and work on it.

dezoito commented 6 months ago

The option to hide model names can help eliminate personal bias especially when comparing different models.

@notasquid1938

Added in v0.5.0