bigcode-project bigcode-evaluation-harness issues

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

702 stars 180 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

make HumanEvalSynthesize error more informative

#151 loubnabnl closed 8 months ago
0
Phi 1.5 evaluation problem

#150 Anindyadeep closed 8 months ago
2
Update README.md

#149 loubnabnl closed 8 months ago
0
Is it possible to run the harness against API hosted models?

#148 pnewhook opened 8 months ago
3
Multi card operation large model

#147 ALLISWELL8 closed 1 week ago
2
code

#146 Lzzzzx closed 7 months ago
1
solved

#145 ALLISWELL8 closed 8 months ago
0
fixed the name of instruct-humaneval

#144 Keytoyze closed 9 months ago
0
Add --save_references_path into args

#143 iCSawyer closed 6 months ago
2
The results are different from those of the codellama paper

#142 ALLISWELL8 closed 1 week ago
6
pip install -e . Error

#141 jihyeleekr closed 9 months ago
0
args.batch_size vs args.num_samples

#140 awasthiabhijeet closed 9 months ago
2
code problem

#139 ALLISWELL8 closed 7 months ago
1
Cannot Reproduce SantaCoder pass@1 on HumanEval

#138 ZhangzihanGit closed 9 months ago
2
code,How to Test Your Private Dataset

#137 ALLISWELL8 closed 7 months ago
1
Evaluation of the task humanevalexplaindescribe

#136 wang-weishi closed 9 months ago
5
allow auto for max_memory_per_gpu

#135 loubnabnl closed 10 months ago
0
StarCoderCommit Prompt

#134 awasthiabhijeet closed 10 months ago
2
Add WizardCoder models (that are CodeLLama based) evaluation

#133 loubnabnl closed 9 months ago
2
fix typo in MultiPL-E bash script

#132 ChiYeungLaw closed 10 months ago
0
'HumanEval' object has no attribute 'dataset'

#131 dongguanting closed 4 months ago
5
add codellama prompt to humanevalsynthesize

#130 loubnabnl closed 10 months ago
2
ModuleNotFoundError: No module named 'lm_eval.ds'

#129 dhingratul closed 10 months ago
2
print precision in correct setting

#128 loubnabnl closed 10 months ago
0
Recode: perturbed human-eval

#127 RaymondLi0 closed 10 months ago
1
Fix str concat

#126 Muennighoff closed 10 months ago
0
Update stop tokens for mbpp and humaneval

#125 VikParuchuri closed 10 months ago
0
fix post-processing of mbpp

#124 loubnabnl closed 11 months ago
0
update leaderboard guide

#123 loubnabnl closed 11 months ago
0
add guide for reproducing leaderboard results and submitting new ones

#122 loubnabnl closed 11 months ago
0
bug in post processing for mbpp task

#121 weiliang-zeng closed 11 months ago
3
Add HumanEvalPack, QuixBugs, Python Bugs, Parity

#120 Muennighoff closed 10 months ago
1
Different stop words between humaneval and instruct humaneval

#119 GinaJihyeonLee closed 11 months ago
1
skipdecode for starcoder (harness)

#118 danielkorat closed 11 months ago
0
installation issue with pip install git+https://github.com/bigcode-project/bigcode-evaluation-harness.git

#117 changwangss closed 11 months ago
2
improve install

#116 changwangss closed 11 months ago
1
Adding additional optional args for decoding flags and AutoModel kwargs to support models like ReplitLM

#115 madhavatreplit opened 12 months ago
0
Support for unmerged peft adapters

#114 cassanof closed 11 months ago
1
Update MultiPL-E docker image

#113 loubnabnl opened 1 year ago
0
Add missing MultiPL-E imports

#112 loubnabnl closed 1 year ago
0
Doc update - `max_length_generation` requirement for GSM tasks

#111 infinitylogesh closed 1 year ago
0
Reproduction of HELM results

#110 tangzhy closed 1 year ago
1
fixed a couple of typos

#109 didier-durand closed 1 year ago
1
When I evaluated the dataset APPS, I got the error RuntimeError: stack.size() >= frames.back().function->n_inputs INTERNAL ASSERT FAILED

#108 Luowaterbi closed 1 year ago
0
Update and rename openai.py to openai_generation.py

#107 terryyz closed 11 months ago
0
MBPP eval extremely slow for CodeGen2 and Replit-Code

#106 AadSah closed 11 months ago
4
error: list index out of range, when testing in multi-gpu?

#105 wwngh1233 closed 1 week ago
6
Support Seq2SeqLM model class (to facilitate the CodeT5+ models)

#104 keyboardAnt opened 1 year ago
0
how to use --instruction_tokens?

#103 wwngh1233 closed 11 months ago
1
Support `Salesforce/codet5p-220m` and other `T5ForConditionalGeneration` models

#102 keyboardAnt closed 1 week ago
1

Previous Next