Open Krisseck opened 6 months ago
Hi Krisseck,
There isn't currently a formal submission process for the creative writing test. However if you have interesting models / results to share I will be happy to take a look and reproduce any that look interesting using claude-opus, for inclusion on the leaderboard.
If it's alright, I'll share some benchmark results here. No big surprises though.
Prompt format | Model | Score | Test |
---|---|---|---|
ChatML | TheBloke/CapybaraHermes-2.5-Mistral-7B-GGUF (Q8) | 52.74 | creative-writing |
ChatML | NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF (Q3_K_M) | 58.4 | creative-writing |
Alpaca | TheBloke/EstopianMaid-13B-GGUF (Q8) | 51.49 | creative-writing |
Alpaca | TheBloke/MythoMax-L2-13B-GGUF (Q8) | 52.1 | creative-writing |
Mistral | N8Programs/Coxcomb-GGUF (Q8) | 56.09 | creative-writing |
open-mixtral-8x22b
on Mistral APIThanks for sharing these results! I will see how they fare with claude opus as judge. I'm a bit surprised the numbers are so low actually, considering that mixtral-8x22b-instruct typically scored models significantly higher than this in its judgemark results: https://eqbench.com/results/judgemark_test_model_scores/judgemark_score_ci_mistralai__Mixtral-8x22B-Instruct-v0.1.png
Were you using oobabooga for the inferencing engine with these results?
I'd like to contribute the Creative Writing benchmark.
Since I live in the EU, I do not have access to the Claude API. I am currently running using Mixtral 8x22B as the judge via Mistral API.
Can I contribute those results? Also, is there any tutorial on how to share the results, do I just make a PR? 🙂