issues
search
clp-research
/
clembench
A Framework for the Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents and an Extensible Benchmark
MIT License
19
stars
26
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
issue #47: add option to set max_token to be generated
#48
phisad
closed
4 months ago
1
Make max_new_tokens available in cli script
#47
sherzod-hakimov
closed
4 months ago
2
Add models to HF model registry
#46
Gnurro
closed
4 months ago
0
Preview/hf backend refactor
#45
phisad
closed
4 months ago
0
Add option to specify filename for instances in GameInstanceGenerator (cleaned)
#44
phisad
closed
5 months ago
1
Separate Game Scoring from GameMaster and extract functionality into a GameScorer class
#43
lpfennigschmidt
closed
4 months ago
0
Feature dialogue gamemaster reprompting, implements #41
#42
lpfennigschmidt
closed
5 months ago
0
[feature] Enable easy re-prompting mechanism in DialogueGameMaster
#41
lpfennigschmidt
closed
5 months ago
6
[refactoring] Extract score computation into its own class
#40
lpfennigschmidt
closed
4 months ago
3
Kwargs for player calls
#39
yerkesoul
closed
2 months ago
7
Add option to specify filename for instances in GameInstanceGenerator
#38
AnneBeyer
closed
5 months ago
1
HF backend refactor
#37
Gnurro
closed
5 months ago
0
[framework] Make it easier to select instances that are going to be used -- for example, different languages
#36
davidschlangen
closed
3 months ago
17
Feat/refactor model pairing
#35
phisad
closed
5 months ago
0
allow users to provide model identifiers as arguments (instead of options)
#34
phisad
closed
5 months ago
5
Feat/results by pairing
#33
phisad
closed
5 months ago
0
Hf model updates
#32
Gnurro
closed
5 months ago
0
DOC: Adding new models to HF backend
#31
Gnurro
closed
5 months ago
0
[backend] decision about switching to some library that provides wrapper around accessing open-weight models locally
#30
sherzod-hakimov
closed
5 months ago
2
HF backend model additions: Yi-34B-Chat, Openchat-3.5, Tulu-2, DeepSeek and Mixtral
#29
Gnurro
closed
6 months ago
0
[backends] Consolidate huggingface backends
#28
davidschlangen
closed
4 months ago
7
"relaxed parsing mode" for games? relaxed move rules...
#27
davidschlangen
opened
7 months ago
11
revamp mechanism for mapping model names to backends (and further information)
#26
davidschlangen
closed
4 months ago
43
documentation on benchmarking and updating leaderboard
#25
sherzod-hakimov
closed
3 months ago
2
private/shared should ensure proper turn taking (assistant / user / assistant) in messages
#24
davidschlangen
closed
5 months ago
10
wordle game should use standard method for checking whether it runs over max context size
#23
davidschlangen
opened
7 months ago
1
backends should expose a method for getting size in tokens / max context size violation
#22
davidschlangen
closed
2 months ago
16
[docs] clarify what the backends must return
#21
briemadu
closed
7 months ago
0
max tokens in backends should not be hardcoded
#20
davidschlangen
closed
4 months ago
3
use global variables instead of strings in openai backend
#19
davidschlangen
closed
5 months ago
3
Local backends fix
#18
Gnurro
closed
7 months ago
1
Huggingface backend
#17
Gnurro
closed
7 months ago
1
Hf sync test
#16
Gnurro
closed
7 months ago
1
double check huggingface backend, apparently does not get passed temperature parameter correctly
#15
davidschlangen
closed
3 months ago
2
add workflow documentation for adding to leaderboard
#14
davidschlangen
closed
4 months ago
2
Huggingface
#13
Gnurro
closed
8 months ago
0
make it possible to determine programmatically which `instances.json`is being used for a run
#12
davidschlangen
closed
4 months ago
2
overhaul documentation on how to add games
#11
davidschlangen
closed
4 months ago
3
script pipeline_huggingfaces.sh overwrite existing key.json
#10
davidschlangen
closed
7 months ago
3
improve tokenizer in taboo
#9
davidschlangen
closed
4 months ago
1
remove GPT4 gymnastics from `pipeline_clembench.sh`
#8
davidschlangen
closed
8 months ago
0
make path for results configurable
#7
davidschlangen
closed
3 months ago
4
documentation `howto_run_benchmark.md` is incomplete / unhelpful
#6
davidschlangen
closed
7 months ago
0
running game fails silently on wrong parameters
#5
davidschlangen
closed
8 months ago
3
`scripts/cli.py ls` broken?
#4
davidschlangen
closed
8 months ago
3
Llama2 backends
#3
Gnurro
closed
7 months ago
1
Huggingface backend refactor
#2
Gnurro
closed
8 months ago
0
Fix swapped slots in probing_questions.json
#1
briemadu
closed
1 year ago
0
Previous