issues
search
bigcode-project
/
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
771
stars
201
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Update requirements.txt
#22
Muennighoff
closed
1 year ago
0
Add revision kwarg
#21
Muennighoff
closed
1 year ago
0
[WIP] Add CodeXGLUE-text-to-text benchmark for documentation translation
#20
infinitylogesh
closed
1 year ago
6
Refactor code to separate tasks
#19
loubnabnl
closed
1 year ago
3
Separate generation and evaluation + add CI
#15
loubnabnl
closed
1 year ago
0
main() crashes with --allow-code-execution=True
#14
ocramz
closed
1 year ago
3
Add CodeXGLUE-code-refinement (few-shot) setting
#13
manandey
opened
1 year ago
5
MultiPL-E Integration
#12
loubnabnl
closed
1 year ago
4
Consider a refactoring
#11
lvwerra
closed
1 year ago
2
Library seems unnecessarily hardcoded
#10
StellaAthena
closed
1 year ago
4
spider zero-shot evaluation with execution acuuracy metric
#9
loubnabnl
closed
1 year ago
0
Design prompts for few-shot evaluation tasks
#17
harm-devries
closed
1 year ago
0
Suggest tasks for the Evaluation Harness
#16
harm-devries
closed
1 year ago
7
Add selected tasks to the Evaluation Harness
#18
harm-devries
closed
1 year ago
0
improve the prompt examples of one-shot setting in APPS evaluation
#8
loubnabnl
closed
1 year ago
4
update APPS few shot setting
#7
loubnabnl
closed
1 year ago
0
add HumanEval-X metric to the HF hub and the task to the harness
#6
loubnabnl
closed
1 year ago
4
add TransCoder task for code translation
#5
loubnabnl
opened
1 year ago
4
Add CodeXGLUE-code-refinement benchmark
#4
loubnabnl
closed
10 months ago
2
Add CodeXGLUE-text-to-text benchmark for documentation translation
#3
loubnabnl
closed
1 year ago
2
Add tests to the evaluation harness
#2
loubnabnl
closed
1 year ago
4
Multilingual evaluation benchmarks
#1
loubnabnl
closed
1 year ago
1
Previous