issues
search
clembench
/
clembench-runs
All outputs generated by running the benchmark on different versions
MIT License
0
stars
5
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
June 24 results
#46
Gnurro
closed
2 days ago
0
Differentiating between Textual & Multimodal Results
#45
sherzod-hakimov
opened
1 week ago
0
add CHANGELOG
#44
davidschlangen
opened
2 weeks ago
1
update characteristics file for model analysis
#43
Nid989
closed
2 weeks ago
0
updated characteristics files for analysis
#42
Nid989
closed
3 weeks ago
0
results after matchit scoring update (commercial models)
#41
kushal-10
closed
3 weeks ago
0
results after matchit scoring update
#40
kushal-10
closed
3 weeks ago
0
updated matchit results on pentomino instances
#39
kushal-10
closed
3 weeks ago
0
fix typo error; characteristics file
#38
Nid989
closed
3 weeks ago
0
update files for model analysis notebook
#37
Nid989
closed
3 weeks ago
0
update idefics results on mm_referencegame
#36
kushal-10
closed
3 weeks ago
0
adding missing scores
#35
yanweiser
closed
3 weeks ago
0
no backend variations in main results directory
#34
davidschlangen
opened
4 weeks ago
0
Add llama-3-8b runs for groq, together.ai
#33
Nid989
closed
4 weeks ago
0
updated open multimodal model results
#32
kushal-10
closed
4 weeks ago
0
Update "data_type" issue str to int64 for attribute columns
#31
Nid989
closed
4 weeks ago
0
Update "data_type" issue str to int64 for attribute columns
#30
Nid989
closed
1 month ago
0
Update notebook model characteristics + Add llama-3-70b runs for groq, anyscale, together.ai
#29
Nid989
closed
1 month ago
0
updated results
#28
kushal-10
closed
1 month ago
0
added llama-3-70b runs (groq, together.ai, anyscale)
#27
Nid989
closed
1 month ago
0
add mapworld results
#26
kushal-10
closed
1 month ago
0
Llama3-hf local results
#25
Gnurro
closed
1 month ago
0
add textmapworld results
#24
kushal-10
closed
1 month ago
0
bux fix "analysis notebook"
#23
Nid989
closed
1 month ago
0
Added notebooks for analysis
#22
Nid989
closed
1 month ago
0
Results with updated matchit_ascii
#21
kushal-10
closed
1 month ago
0
add multimodal referencegame results
#20
kushal-10
closed
1 month ago
0
add matchit and matchit_ascii results
#19
kushal-10
closed
1 month ago
0
`dolphin-2.5-mixtral-8x7b`, `mistral-7b-instruct-v0.2` results via together.AI
#18
Nid989
closed
1 month ago
0
Yi1.5 and Starling-7b-beta results
#17
Gnurro
closed
1 month ago
0
v1.6 results for quantized GGUF models
#16
Gnurro
closed
1 month ago
0
qwen1.5-72b, nous-hermes-2-mixtral-8x7b results via together.AI
#15
Nid989
closed
1 month ago
0
add matchit results
#14
kushal-10
closed
1 month ago
0
Add Meta-Llama-3-70B-Instruct-GGUF-q4 results
#13
Gnurro
closed
1 month ago
0
add Idefics results
#12
kushal-10
closed
1 month ago
0
New HF results
#11
Gnurro
closed
1 month ago
0
Small quantized model test run results
#10
Gnurro
closed
1 month ago
0
Add multimodal HF model results from recent testing run
#9
Gnurro
closed
1 month ago
0
Update with results from recent reruns
#8
Gnurro
closed
1 month ago
0
Adding results of recent 1.5 runs
#7
Gnurro
closed
3 months ago
0
Add openchat-3.5-0106, openchat-3.5-1210 and Nous-Hermes2-Mixtral-DPO results
#6
Gnurro
closed
4 months ago
0
Results by pairing
#5
Gnurro
closed
5 months ago
0
Adding SUStech/SUS-Chat-34B results
#4
Gnurro
closed
5 months ago
0
Add fixed Mistral results for imagegame and referencegame
#3
Gnurro
closed
6 months ago
0
Add Tulu-2, DeepSeek and Mixtral runs
#2
Gnurro
closed
6 months ago
0
Add Yi-34B-Chat and openchat-3.5 results
#1
Gnurro
closed
7 months ago
0