bigcode-project bigcode-evaluation-harness issues

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

771 stars 201 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Update requirements.txt

#22 Muennighoff closed 1 year ago
0
Add revision kwarg

#21 Muennighoff closed 1 year ago
0
[WIP] Add CodeXGLUE-text-to-text benchmark for documentation translation

#20 infinitylogesh closed 1 year ago
6
Refactor code to separate tasks

#19 loubnabnl closed 1 year ago
3
Separate generation and evaluation + add CI

#15 loubnabnl closed 1 year ago
0
main() crashes with --allow-code-execution=True

#14 ocramz closed 1 year ago
3
Add CodeXGLUE-code-refinement (few-shot) setting

#13 manandey opened 1 year ago
5
MultiPL-E Integration

#12 loubnabnl closed 1 year ago
4
Consider a refactoring

#11 lvwerra closed 1 year ago
2
Library seems unnecessarily hardcoded

#10 StellaAthena closed 1 year ago
4
spider zero-shot evaluation with execution acuuracy metric

#9 loubnabnl closed 1 year ago
0
Design prompts for few-shot evaluation tasks

#17 harm-devries closed 1 year ago
0
Suggest tasks for the Evaluation Harness

#16 harm-devries closed 1 year ago
7
Add selected tasks to the Evaluation Harness

#18 harm-devries closed 1 year ago
0
improve the prompt examples of one-shot setting in APPS evaluation

#8 loubnabnl closed 1 year ago
4
update APPS few shot setting

#7 loubnabnl closed 1 year ago
0
add HumanEval-X metric to the HF hub and the task to the harness

#6 loubnabnl closed 1 year ago
4
add TransCoder task for code translation

#5 loubnabnl opened 1 year ago
4
Add CodeXGLUE-code-refinement benchmark

#4 loubnabnl closed 10 months ago
2
Add CodeXGLUE-text-to-text benchmark for documentation translation

#3 loubnabnl closed 1 year ago
2
Add tests to the evaluation harness

#2 loubnabnl closed 1 year ago
4
Multilingual evaluation benchmarks

#1 loubnabnl closed 1 year ago
1