bigcode-project bigcode-evaluation-harness issues

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

744 stars 193 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Program repair

#64 keyboardAnt opened 1 year ago
3
update humaneval postprocessing

#63 loubnabnl closed 1 year ago
1
Rename path arguments

#62 loubnabnl closed 1 year ago
0
remove model from accelerate prepare and add precision argument

#61 loubnabnl closed 1 year ago
0
Add file name support for our method.

#60 SivilTaram closed 1 year ago
0
[WIS] Program repair

#59 keyboardAnt closed 1 year ago
0
Would be better to save generations on the fly

#58 Muennighoff opened 1 year ago
0
chat bug fixer for humaneval-x-bugs

#57 mitya52 closed 1 year ago
3
[WIP] Add the BugRepair Task evaluation

#56 keyboardAnt closed 1 year ago
0
new bullet point

#55 ArmelRandy closed 1 year ago
0
Fix number of copies per task when nsamples is not proportional to batch size

#54 loubnabnl closed 1 year ago
0
Update main.py

#53 SivilTaram closed 1 year ago
0
How to evaluate the model memory efficiently?

#52 Godofnothing closed 1 year ago
6
Error Running Odex Integration Code

#51 murthyrudra closed 1 year ago
2
Fix stop words

#50 Muennighoff closed 1 year ago
1
Variable max_length_generation

#49 Muennighoff opened 1 year ago
0
Query around n_samples argument

#48 murthyrudra closed 1 year ago
7
Commit / Edit / Diff models & their evaluation

#47 Muennighoff closed 1 year ago
1
HumanEval post-processing

#46 RaymondLi0 closed 1 year ago
2
add odex and mconala datasets

#45 zorazrw closed 8 months ago
1
Integrate MultiPL-E

#44 loubnabnl closed 1 year ago
4
Change bool arguments format

#43 loubnabnl closed 1 year ago
0
Add setup file

#42 loubnabnl closed 1 year ago
1
update readme with trust_remote_code arg

#41 loubnabnl closed 1 year ago
0
add trust_remote_code and use_auth_token args

#40 loubnabnl closed 1 year ago
0
[WIP; Input Requested] Integrated DS1000 task

#39 benlipkin closed 1 year ago
6
Support Kotlin in MultiPL-E

#38 arjunguha closed 1 year ago
0
Add GSM8k - Math Reasoning Dataset to the evaluation tasks

#37 infinitylogesh closed 1 year ago
7
Add batch evaluation support when batch_size > 1

#36 infinitylogesh opened 1 year ago
8
Add Reasoning tasks to the evaluation

#35 infinitylogesh closed 9 months ago
2
Per-PL perplexity vs. pass@k rate

#34 arjunguha closed 1 year ago
1
Execution-based FIM evaluation

#33 arjunguha closed 1 year ago
0
fix typos

#32 manandey closed 1 year ago
0
Preliminary cite

#31 Muennighoff closed 1 year ago
1
Add metric name to outfile

#30 loubnabnl closed 1 year ago
0
Port BLEU computation

#29 Muennighoff closed 1 year ago
0
add template for new tasks

#28 loubnabnl closed 1 year ago
0
[Minor] Missing task template

#27 JeanKaddour closed 1 year ago
1
[Minor] Conflicting dependencies in requirements.txt

#26 JeanKaddour closed 1 year ago
2
fix dependency

#25 manandey closed 1 year ago
0
Show metric in outfile

#24 Muennighoff closed 1 year ago
2
support for batch size > 1 for single problem generations (n_samples=1)

#23 Muennighoff opened 1 year ago
5
Update requirements.txt

#22 Muennighoff closed 1 year ago
0
Add revision kwarg

#21 Muennighoff closed 1 year ago
0
[WIP] Add CodeXGLUE-text-to-text benchmark for documentation translation

#20 infinitylogesh closed 1 year ago
6
Refactor code to separate tasks

#19 loubnabnl closed 1 year ago
3
Separate generation and evaluation + add CI

#15 loubnabnl closed 1 year ago
0
main() crashes with --allow-code-execution=True

#14 ocramz closed 1 year ago
3
Add CodeXGLUE-code-refinement (few-shot) setting

#13 manandey opened 1 year ago
5
MultiPL-E Integration

#12 loubnabnl closed 1 year ago
4

Previous Next