issues
search
bigcode-project
/
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
825
stars
219
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Adding an assistant_model Argument for Speculative Decoding
#288
ilyasoulk
opened
4 hours ago
0
Could you share a completed file of generations_mbppplus.json
#287
marybloodyzz
opened
1 week ago
0
Fix typos
#286
gameofby
opened
2 weeks ago
0
mbpp pass@1 is lower than mbppplus pass@1 on starcoder2?
#285
Liuzzyg
opened
2 weeks ago
0
Continuing / Extending Previous Results from Generating and Evaluating?
#284
RylanSchaeffer
opened
3 weeks ago
0
Docker image for multiple evalulation broken
#283
Extirpater
opened
3 weeks ago
1
[REQUEST] Apply tokenizer chat template for HumanEvalPack
#282
timrbula
opened
4 weeks ago
1
add support for hpu devices
#281
envsp
opened
4 weeks ago
0
"," missing in LANGUAGES list
#280
ArtemisDicoTiar
opened
1 month ago
2
Update docker image for HumanEvalPack-Synthesize.
#279
zongyf02
closed
4 weeks ago
2
[REQUEST] support model-parallel evaluation for big models
#278
tarahjjeon
opened
1 month ago
0
Speedup execute.py: Reuse same manager and dict in
#277
michaelfeil
opened
1 month ago
0
trust_remote_code is not been passed to dataset
#276
login256
opened
1 month ago
1
ValueError: Infilling not yet supported for:/Meta-Llama-3.1-8B
#275
kbmlcoding
opened
2 months ago
0
Update `fsspec`
#274
younesbelkada
closed
3 weeks ago
4
MultiPL-E now supports Dart, OCaml, Elixir, Haskell, Clojure
#273
arjunguha
closed
1 month ago
2
Evaluation result of bigcode/starcoder2-3b on gsm8k_pal does not matched the paper
#272
nongfang55
opened
2 months ago
0
Evaluating a Model with a Local Dataset in an Offline Environment
#271
ankush13r
opened
2 months ago
8
Any pypi package for this tool?
#270
zhimin-z
opened
2 months ago
0
fix typo bug in mbppplus
#269
zkcpku
closed
1 month ago
0
Unable to execute the `MultiPL-E` task for `python` language
#268
manthan0227
opened
2 months ago
0
ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers'
#267
xinghuang2050
opened
2 months ago
1
What is `fine-tuning` in task submission?
#266
zhimin-z
opened
2 months ago
1
Downgrade pyext dependency
#265
shehrozek-cerebras
closed
2 months ago
1
Error of testing codegeex4-all-9b model
#264
Gumingbro
opened
3 months ago
1
Basecodes
#263
Abhineetsoccer
opened
3 months ago
0
Humaneval and MBPP results of deepseek-6.7b-coder-instruct are lower than offical report of Deepseek team
#262
jessyford
opened
3 months ago
2
HumanEval evaluation results mismatch for Codegemma-2b
#261
berserank
opened
3 months ago
1
Add a new benchmark ENAMEL for evaluating the efficiency of LLM-generated code
#260
q-rz
opened
4 months ago
0
`multiple-cs`, `multiple-go`, and `multiple-d` are broken (`FileNotFound`)
#259
alat-rights
closed
4 months ago
2
[Possibly system specific] Wild (12% vs 20%) run-to-run swings in `multiple-cpp` reported scores
#258
alat-rights
opened
4 months ago
1
Fix Max New Tokens in HF's Generation Config
#257
mostafaelhoushi
opened
4 months ago
2
Need some context for certain args for Instruct Human Eval
#256
teknium1
opened
4 months ago
2
Update Dockerfile-multiple
#255
arjunguha
closed
4 months ago
0
Update MultiPL-E to v3 prompts
#254
arjunguha
closed
4 months ago
0
[draft]save prompts and tests passed/failed
#253
kbmlcoding
closed
4 months ago
0
The evaluation results are inconsistent across different GPUs
#252
DonteFlynn
opened
4 months ago
0
Using the humanevalpack to test the ChatGLM3 model results in an abnormal score.
#251
burger-pb
opened
4 months ago
0
Hi
#250
burger-pb
closed
4 months ago
0
Fix unnecessary repeated overwrite
#249
nielstron
opened
4 months ago
0
Updating SparseML loading to SparseAutoModel
#248
abhinavnmagic
closed
5 months ago
0
Fix: Leaderboard submission Documentation
#247
anil-gurbuz
closed
5 months ago
0
MBPP Llama3-8B-Instruct lower pass@1 score expected
#246
YangZhou08
opened
5 months ago
1
Using custom prompts and postprocessing
#245
anil-gurbuz
opened
5 months ago
1
Adding support for transformers>=4.40.2 to avoid crash with mbpp
#244
meher-m
closed
5 months ago
2
Value error regarding to "--max_length_generation"
#243
aladinggit
closed
5 months ago
1
Following submission documentation fails to save generated model outputs for our(Jdoodle) private HF model
#242
anil-gurbuz
closed
5 months ago
3
fix: rust timeout exception
#241
seobeomjin
closed
5 months ago
1
Some questions about APPS
#240
virt9
closed
5 months ago
1
Changes
#239
meher-m
closed
5 months ago
0
Next