issues
search
clp-research
/
clembench
A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19
stars
26
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
[wordle] fix re-prompt prompts
#98
davidschlangen
opened
6 days ago
1
Model additions
#97
Gnurro
opened
1 week ago
1
add multimodal backend
#96
kushal-10
closed
2 weeks ago
0
[games] make “abort” consistent
#95
davidschlangen
opened
2 weeks ago
0
[image game] INSTRUCTION not a good label?
#94
davidschlangen
opened
1 month ago
1
Feat/multimodal backend
#93
sherzod-hakimov
closed
1 month ago
0
New model entries for May 2024
#92
Gnurro
closed
1 month ago
1
[wordle] wordle h/h has a bug -- check if wordle clemgame does as well
#91
davidschlangen
opened
2 months ago
1
[wordle] reconsider what is used as quality score
#90
davidschlangen
opened
2 months ago
0
Add model OpenAI gpt-4-vision-preview
#89
yanweiser
closed
1 month ago
2
Llamacpp model registry fix
#88
Gnurro
closed
2 months ago
0
[games] Do another round of prompt criticism / unification. For clembench 1.6
#87
davidschlangen
opened
2 months ago
0
[framework] rethink response parsing
#86
davidschlangen
opened
2 months ago
0
Double model loading/initialization
#85
Gnurro
closed
2 months ago
3
update multilingual branch with new models
#84
AnneBeyer
closed
2 months ago
0
[leaderboard] rethink approach to deciding on which models to test
#83
davidschlangen
opened
2 months ago
1
Proper step-by-step guide for game creation
#82
Gnurro
opened
2 months ago
6
Llamacpp backend
#81
Gnurro
closed
2 months ago
3
[evaluation] check bencheval score if certain episodes don't have scores file
#80
sherzod-hakimov
closed
2 weeks ago
6
continue until no messages left to fix
#79
sherzod-hakimov
closed
2 months ago
1
[framework] Integrate External API Backends (Together) that support many models
#78
sherzod-hakimov
opened
2 months ago
1
player calls method to ensure alternating roles in messages history
#77
phisad
closed
2 months ago
3
[games] claude models seem to have problem with taboo
#76
davidschlangen
closed
2 months ago
2
HF backend fix: Prevent excessive generation warnings
#75
Gnurro
closed
3 months ago
0
Basic direct Model usage documentation
#74
Gnurro
closed
3 months ago
0
[referencegame] regex fix
#73
AnneBeyer
closed
2 months ago
1
HF model additions 03.04.2024
#72
Gnurro
closed
3 months ago
0
[games] Open questions on private/shared game
#71
briemadu
opened
3 months ago
2
model responses differ when run locally and when run via fastchat API
#70
davidschlangen
opened
3 months ago
0
multilingual referencegame
#69
AnneBeyer
closed
3 months ago
0
Allow game-specific arguments for instance generation
#68
AnneBeyer
closed
2 months ago
0
[framework] integrate gpt & claude vision API
#67
sherzod-hakimov
closed
2 weeks ago
0
add option to set results dir
#66
phisad
closed
3 months ago
0
Adding google/gemma-7b-it HF-local support
#65
Gnurro
closed
4 months ago
0
Documentation Updates February 2024
#64
Gnurro
closed
4 months ago
0
[backend] Missing Huggingface backend logging
#63
Gnurro
closed
3 months ago
3
[structure] separate framework from benchmark (from games?) / make pip installable
#62
davidschlangen
opened
4 months ago
1
add common clem_player info to response obj
#61
phisad
closed
4 months ago
0
[framework] remove special handling of programmatic Players
#60
phisad
opened
4 months ago
0
Add pad_token_id handling to prevent excessive warnings
#59
Gnurro
closed
4 months ago
1
[scoring] ensure that all scores use the full range 0:100
#58
davidschlangen
opened
4 months ago
1
feat/taboo instances v1.5
#57
phisad
closed
4 months ago
0
[framework] future-proof: allow any kind of data in conversation
#56
davidschlangen
opened
4 months ago
2
[documentation] add doc on how to add a new model
#55
davidschlangen
opened
4 months ago
2
[documentation] `howto_add_backend.md` is obsolete / outdated?
#54
davidschlangen
closed
2 months ago
1
Merging clembench v1.0
#53
lpfennigschmidt
closed
4 months ago
0
add cli option to specify instance file name
#52
phisad
closed
4 months ago
0
Feat/model registry
#51
phisad
closed
4 months ago
0
[enhancement] Suppress backend warnings when running experiments
#50
lpfennigschmidt
closed
2 months ago
10
Preview feat/model registry
#49
lpfennigschmidt
closed
4 months ago
1
Next