issues
search
clp-research
/
clembench
A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
25
stars
34
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Multilingual referencegame and imagegame (adjustments)
#136
SandraNeuhaeus
closed
5 days ago
0
Add Qwen2.5-Coder-Instruct models to the registry
#135
Gnurro
opened
1 week ago
0
Update Multimodal Backend from v1.5 to v1.6.5
#134
kushal-10
closed
2 weeks ago
0
[games] make separation between games and worlds clearer
#133
davidschlangen
opened
2 weeks ago
2
Merge changes in Image & Wordle games
#132
sherzod-hakimov
closed
2 weeks ago
0
Nemotron model addition
#131
Gnurro
opened
1 month ago
0
Multilingual referencegame and imagegame
#130
SandraNeuhaeus
closed
1 week ago
0
Add BSC-LT/salamandra-7b-instruct model entry to model registry
#129
Gnurro
opened
1 month ago
0
Llama-3.2 and EuroLLM model entries
#128
Gnurro
closed
2 months ago
0
Add Qwen2.5-7B/14B/32B/72B-Instruct model entries
#127
Gnurro
closed
2 months ago
0
update dates to ISO format
#126
kushal-10
closed
2 months ago
0
[documentation] run pdoc or similar to automatically create API documentation
#125
davidschlangen
opened
2 months ago
6
[documentation] Add "how_to_contribute" doc
#124
davidschlangen
opened
2 months ago
0
[backend] release date in model_registry must be in ISO format
#123
davidschlangen
closed
1 month ago
2
[wordle] Context limit exceeding not handled properly
#122
Gnurro
closed
2 months ago
6
plotting the generated graphs next to the target graph
#121
yanweiser
closed
2 months ago
0
Update EOS culling to proper regEx substitution
#120
Gnurro
closed
2 months ago
2
Hardcode hotfix to handle model entry eos_to_cull containing "|"
#119
Gnurro
closed
2 months ago
2
[backends] Fishy "||" in interactions.json
#118
Antonia-Schmidt
closed
1 month ago
4
[documentation] update howto_run_games_locally
#117
AnneBeyer
closed
2 months ago
0
vLLM backend
#116
Gnurro
closed
2 weeks ago
5
fix encoding in openai-backend; update and add documentation notebooks
#115
davidschlangen
closed
3 months ago
0
[repository] update main README.txt
#114
davidschlangen
closed
2 months ago
0
Including Codenames in clembench
#113
lpfennigschmidt
closed
1 month ago
0
[backends] Benchmark configuration file to change settings such as openai-compatible API backend
#112
Gnurro
opened
3 months ago
2
Add gemma-2-2b-it model registry entry
#111
Gnurro
closed
3 months ago
0
Missing logs for episodes with exceptions
#110
Gnurro
opened
3 months ago
1
add release-dates and open-weight flag
#109
kushal-10
closed
3 months ago
0
changing mm_mapworld games to show images in the transcripts
#108
yanweiser
closed
4 months ago
1
Logging and scoring documentation update
#107
Gnurro
closed
4 months ago
0
Add Mistral-Large-Instruct-2407 to model registry
#106
Gnurro
closed
4 months ago
0
Refcator multimodal backend
#105
kushal-10
closed
3 months ago
0
Add Meta-Llama-3.1-8B-Instruct and Meta-Llama-3.1-70B-Instruct
#104
Gnurro
closed
4 months ago
0
Add Meta-Llama-3.1 support
#103
Gnurro
closed
4 months ago
0
Add Llama-3-SauerkrautLM-70b-Instruct to the model registry
#102
Gnurro
closed
4 months ago
0
Update README.md
#101
davidschlangen
closed
4 months ago
0
[analysis] LLM Calculator: inference parameters, like latency etc. - find the best models based on filters
#100
davidschlangen
opened
4 months ago
2
[framework] game registry
#99
davidschlangen
opened
4 months ago
20
[wordle] fix re-prompt prompts
#98
davidschlangen
opened
4 months ago
1
Model additions
#97
Gnurro
closed
4 months ago
1
add multimodal backend
#96
kushal-10
closed
5 months ago
0
[games] make “abort” consistent
#95
davidschlangen
opened
5 months ago
0
[image game] INSTRUCTION not a good label?
#94
davidschlangen
closed
1 month ago
4
Feat/multimodal backend
#93
sherzod-hakimov
closed
6 months ago
0
New model entries for May 2024
#92
Gnurro
closed
6 months ago
1
[wordle] wordle h/h has a bug -- check if wordle clemgame does as well
#91
davidschlangen
closed
4 months ago
1
[wordle] reconsider what is used as quality score
#90
davidschlangen
closed
2 months ago
1
Add model OpenAI gpt-4-vision-preview
#89
yanweiser
closed
6 months ago
2
Llamacpp model registry fix
#88
Gnurro
closed
6 months ago
0
[games] Do another round of prompt criticism / unification. For clembench 1.6
#87
davidschlangen
opened
6 months ago
1
Next