bigcode-project bigcode-evaluation-harness issues

bigcode-project / bigcode-evaluation-harness

A framework for the evaluation of autoregressive code generation language models.

Apache License 2.0

825 stars 219 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Adding an assistant_model Argument for Speculative Decoding

#288 ilyasoulk opened 4 hours ago
0
Could you share a completed file of generations_mbppplus.json

#287 marybloodyzz opened 1 week ago
0
Fix typos

#286 gameofby opened 2 weeks ago
0
mbpp pass@1 is lower than mbppplus pass@1 on starcoder2?

#285 Liuzzyg opened 2 weeks ago
0
Continuing / Extending Previous Results from Generating and Evaluating?

#284 RylanSchaeffer opened 3 weeks ago
0
Docker image for multiple evalulation broken

#283 Extirpater opened 3 weeks ago
1
[REQUEST] Apply tokenizer chat template for HumanEvalPack

#282 timrbula opened 4 weeks ago
1
add support for hpu devices

#281 envsp opened 4 weeks ago
0
"," missing in LANGUAGES list

#280 ArtemisDicoTiar opened 1 month ago
2
Update docker image for HumanEvalPack-Synthesize.

#279 zongyf02 closed 4 weeks ago
2
[REQUEST] support model-parallel evaluation for big models

#278 tarahjjeon opened 1 month ago
0
Speedup execute.py: Reuse same manager and dict in

#277 michaelfeil opened 1 month ago
0
trust_remote_code is not been passed to dataset

#276 login256 opened 1 month ago
1
ValueError: Infilling not yet supported for:/Meta-Llama-3.1-8B

#275 kbmlcoding opened 2 months ago
0
Update `fsspec`

#274 younesbelkada closed 3 weeks ago
4
MultiPL-E now supports Dart, OCaml, Elixir, Haskell, Clojure

#273 arjunguha closed 1 month ago
2
Evaluation result of bigcode/starcoder2-3b on gsm8k_pal does not matched the paper

#272 nongfang55 opened 2 months ago
0
Evaluating a Model with a Local Dataset in an Offline Environment

#271 ankush13r opened 2 months ago
8
Any pypi package for this tool?

#270 zhimin-z opened 2 months ago
0
fix typo bug in mbppplus

#269 zkcpku closed 1 month ago
0
Unable to execute the `MultiPL-E` task for `python` language

#268 manthan0227 opened 2 months ago
0
ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers'

#267 xinghuang2050 opened 2 months ago
1
What is `fine-tuning` in task submission?

#266 zhimin-z opened 2 months ago
1
Downgrade pyext dependency

#265 shehrozek-cerebras closed 2 months ago
1
Error of testing codegeex4-all-9b model

#264 Gumingbro opened 3 months ago
1
Basecodes

#263 Abhineetsoccer opened 3 months ago
0
Humaneval and MBPP results of deepseek-6.7b-coder-instruct are lower than offical report of Deepseek team

#262 jessyford opened 3 months ago
2
HumanEval evaluation results mismatch for Codegemma-2b

#261 berserank opened 3 months ago
1
Add a new benchmark ENAMEL for evaluating the efficiency of LLM-generated code

#260 q-rz opened 4 months ago
0
`multiple-cs`, `multiple-go`, and `multiple-d` are broken (`FileNotFound`)

#259 alat-rights closed 4 months ago
2
[Possibly system specific] Wild (12% vs 20%) run-to-run swings in `multiple-cpp` reported scores

#258 alat-rights opened 4 months ago
1
Fix Max New Tokens in HF's Generation Config

#257 mostafaelhoushi opened 4 months ago
2
Need some context for certain args for Instruct Human Eval

#256 teknium1 opened 4 months ago
2
Update Dockerfile-multiple

#255 arjunguha closed 4 months ago
0
Update MultiPL-E to v3 prompts

#254 arjunguha closed 4 months ago
0
[draft]save prompts and tests passed/failed

#253 kbmlcoding closed 4 months ago
0
The evaluation results are inconsistent across different GPUs

#252 DonteFlynn opened 4 months ago
0
Using the humanevalpack to test the ChatGLM3 model results in an abnormal score.

#251 burger-pb opened 4 months ago
0
Hi

#250 burger-pb closed 4 months ago
0
Fix unnecessary repeated overwrite

#249 nielstron opened 4 months ago
0
Updating SparseML loading to SparseAutoModel

#248 abhinavnmagic closed 5 months ago
0
Fix: Leaderboard submission Documentation

#247 anil-gurbuz closed 5 months ago
0
MBPP Llama3-8B-Instruct lower pass@1 score expected

#246 YangZhou08 opened 5 months ago
1
Using custom prompts and postprocessing

#245 anil-gurbuz opened 5 months ago
1
Adding support for transformers>=4.40.2 to avoid crash with mbpp

#244 meher-m closed 5 months ago
2
Value error regarding to "--max_length_generation"

#243 aladinggit closed 5 months ago
1
Following submission documentation fails to save generated model outputs for our(Jdoodle) private HF model

#242 anil-gurbuz closed 5 months ago
3
fix: rust timeout exception

#241 seobeomjin closed 5 months ago
1
Some questions about APPS

#240 virt9 closed 5 months ago
1
Changes

#239 meher-m closed 5 months ago
0