issues
search
bigcode-project
/
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825
stars
219
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Add a new dataset Mercury
#238
Elfsong
closed
5 months ago
2
How can I pass num_beams?
#237
Sunt-ing
closed
5 months ago
2
Fix MBPP bug with transformers 4.38+
#236
edgan8
closed
5 months ago
6
Run 70b evaluation.
#235
icoderzqliu
closed
6 months ago
2
API-based evaluation support (humanevalpack_openai.py is too old)
#234
s-natsubori
opened
6 months ago
0
The result of llama2-13b-chat(pass@1 18) is worse than paper(pass@1 37)
#233
moyi-qwq
closed
6 months ago
2
Add package name and enforce python version in `setup.py`
#232
shehrozek-cerebras
closed
6 months ago
4
Make `bigcode_eval` pip installable
#231
shehrozek-cerebras
closed
5 months ago
1
If I want to add my own designed prompts before each question, how should I modify the code
#230
ALLISWELL8
opened
7 months ago
1
Add StudentEval from LLM4Code 2024
#229
arjunguha
closed
7 months ago
0
The results of Llama3-8b pass@1 is worse than report
#228
shuaiwang2022
opened
7 months ago
6
ignore --use_auth_token if model doesn't require it
#227
Vipitis
opened
7 months ago
0
[FR] include "config" data in generations_only
#226
Vipitis
opened
7 months ago
0
fix: Multiple-E dataset fix go_test.go path for test execution
#225
hitesh-1997
opened
7 months ago
3
Multiple-E Go test file name suffix does not contain _test.go
#224
hitesh-1997
opened
7 months ago
0
refactor(evalplus): maintain mbpp+ v0.2.0
#223
ganler
closed
7 months ago
0
Add llama3 instruction prompts
#222
TechxGenus
opened
7 months ago
0
Support for vLLM
#221
noforit
opened
7 months ago
0
The results of codellama-7b-hf pass@1 is worse than paper
#220
PeiqinSun
closed
7 months ago
3
Add instruct models prompts
#219
loubnabnl
closed
7 months ago
0
run the MBPP in the HumanEval data format
#218
virt9
closed
5 months ago
0
Leaderboard README improvements
#217
nikita1503
opened
7 months ago
0
remove pad tokens added by the accelerator.pad_across_processes
#216
IQ17
opened
7 months ago
0
Please add flag to log score for each sample (akin to Eleuther's LM Evaluation Harness)
#215
RylanSchaeffer
opened
7 months ago
3
Finetune starcoderbase-1b
#214
SummCoder
opened
7 months ago
0
MultiPL-E generations step is hung
#213
Santhoshkumar-p
opened
7 months ago
0
Ensure generations get saved in generation_only mode
#212
Vipitis
opened
7 months ago
0
Check pass/fail count for humaneval
#211
toptechie156
closed
7 months ago
1
请问使用vllm评测时怎么实现类似HF多卡数据并行?
#210
noforit
closed
8 months ago
0
why change n_copies from 1 to 2?
#209
Reeleon
opened
8 months ago
0
Add prompt
#208
Muennighoff
closed
8 months ago
0
max_length_generation parameter
#207
icoderzqliu
closed
5 months ago
4
fix apps evaluate error: local variable 'level' referenced before assignment
#206
koking0
opened
8 months ago
0
How to use my local model files to run the program? I cant download the model online
#205
virt9
closed
8 months ago
4
Update README.md
#204
AnitaLiu98
opened
8 months ago
4
Support for HumanEval-Infilling Benchmark
#203
Hambaobao
closed
8 months ago
1
[Urgent Issue] Cannot run HumanEval benchmarking on CodeLlama model
#202
cosmo3769
closed
8 months ago
6
Add issue prompt
#201
Muennighoff
closed
8 months ago
0
DS-F-ENG (ignore the name) evaluation task
#200
sfc-gh-ajedrosz
closed
9 months ago
0
Add prompts
#199
Muennighoff
closed
9 months ago
0
Support for StudentEval Dataset (Again)
#198
guanqun-yang
opened
9 months ago
1
AATK process_results is missing
#197
adiprasad
opened
9 months ago
4
Fix loading PAL-GSM few-shot examples
#196
sxjscience
opened
9 months ago
0
To evaluate Github copilot?
#195
liw8hz
opened
9 months ago
0
add support for codellama-70b prompt
#194
loubnabnl
closed
9 months ago
0
[FEATURE REQUEST] Support HumanEval+ tests for MultiPL-E
#193
Randl
opened
10 months ago
0
Potentially extra slow inference when using LoRA adapter
#192
sadaisystems
opened
10 months ago
1
Out-of-memory of multi-gpu evaluation
#191
yifan-bao
closed
5 months ago
1
Add mbpp+ evaluation task
#190
ganler
closed
9 months ago
0
Make `main.py` compatible with OpenAI compatible APIs
#189
hmellor
opened
10 months ago
5
Previous
Next