clp-research clembench issues

clp-research / clembench

A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark

MIT License

25 stars 34 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Multilingual referencegame and imagegame (adjustments)

#136 SandraNeuhaeus closed 5 days ago
0
Add Qwen2.5-Coder-Instruct models to the registry

#135 Gnurro opened 1 week ago
0
Update Multimodal Backend from v1.5 to v1.6.5

#134 kushal-10 closed 2 weeks ago
0
[games] make separation between games and worlds clearer

#133 davidschlangen opened 2 weeks ago
2
Merge changes in Image & Wordle games

#132 sherzod-hakimov closed 2 weeks ago
0
Nemotron model addition

#131 Gnurro opened 1 month ago
0
Multilingual referencegame and imagegame

#130 SandraNeuhaeus closed 1 week ago
0
Add BSC-LT/salamandra-7b-instruct model entry to model registry

#129 Gnurro opened 1 month ago
0
Llama-3.2 and EuroLLM model entries

#128 Gnurro closed 2 months ago
0
Add Qwen2.5-7B/14B/32B/72B-Instruct model entries

#127 Gnurro closed 2 months ago
0
update dates to ISO format

#126 kushal-10 closed 2 months ago
0
[documentation] run pdoc or similar to automatically create API documentation

#125 davidschlangen opened 2 months ago
6
[documentation] Add "how_to_contribute" doc

#124 davidschlangen opened 2 months ago
0
[backend] release date in model_registry must be in ISO format

#123 davidschlangen closed 1 month ago
2
[wordle] Context limit exceeding not handled properly

#122 Gnurro closed 2 months ago
6
plotting the generated graphs next to the target graph

#121 yanweiser closed 2 months ago
0
Update EOS culling to proper regEx substitution

#120 Gnurro closed 2 months ago
2
Hardcode hotfix to handle model entry eos_to_cull containing "|"

#119 Gnurro closed 2 months ago
2
[backends] Fishy "||" in interactions.json

#118 Antonia-Schmidt closed 1 month ago
4
[documentation] update howto_run_games_locally

#117 AnneBeyer closed 2 months ago
0
vLLM backend

#116 Gnurro closed 2 weeks ago
5
fix encoding in openai-backend; update and add documentation notebooks

#115 davidschlangen closed 3 months ago
0
[repository] update main README.txt

#114 davidschlangen closed 2 months ago
0
Including Codenames in clembench

#113 lpfennigschmidt closed 1 month ago
0
[backends] Benchmark configuration file to change settings such as openai-compatible API backend

#112 Gnurro opened 3 months ago
2
Add gemma-2-2b-it model registry entry

#111 Gnurro closed 3 months ago
0
Missing logs for episodes with exceptions

#110 Gnurro opened 3 months ago
1
add release-dates and open-weight flag

#109 kushal-10 closed 3 months ago
0
changing mm_mapworld games to show images in the transcripts

#108 yanweiser closed 4 months ago
1
Logging and scoring documentation update

#107 Gnurro closed 4 months ago
0
Add Mistral-Large-Instruct-2407 to model registry

#106 Gnurro closed 4 months ago
0
Refcator multimodal backend

#105 kushal-10 closed 3 months ago
0
Add Meta-Llama-3.1-8B-Instruct and Meta-Llama-3.1-70B-Instruct

#104 Gnurro closed 4 months ago
0
Add Meta-Llama-3.1 support

#103 Gnurro closed 4 months ago
0
Add Llama-3-SauerkrautLM-70b-Instruct to the model registry

#102 Gnurro closed 4 months ago
0
Update README.md

#101 davidschlangen closed 4 months ago
0
[analysis] LLM Calculator: inference parameters, like latency etc. - find the best models based on filters

#100 davidschlangen opened 4 months ago
2
[framework] game registry

#99 davidschlangen opened 4 months ago
20
[wordle] fix re-prompt prompts

#98 davidschlangen opened 4 months ago
1
Model additions

#97 Gnurro closed 4 months ago
1
add multimodal backend

#96 kushal-10 closed 5 months ago
0
[games] make “abort” consistent

#95 davidschlangen opened 5 months ago
0
[image game] INSTRUCTION not a good label?

#94 davidschlangen closed 1 month ago
4
Feat/multimodal backend

#93 sherzod-hakimov closed 6 months ago
0
New model entries for May 2024

#92 Gnurro closed 6 months ago
1
[wordle] wordle h/h has a bug -- check if wordle clemgame does as well

#91 davidschlangen closed 4 months ago
1
[wordle] reconsider what is used as quality score

#90 davidschlangen closed 2 months ago
1
Add model OpenAI gpt-4-vision-preview

#89 yanweiser closed 6 months ago
2
Llamacpp model registry fix

#88 Gnurro closed 6 months ago
0
[games] Do another round of prompt criticism / unification. For clembench 1.6

#87 davidschlangen opened 6 months ago
1