issues
search
bigcode-project
/
bigcode-evaluation-harness
A framework for the evaluation of autoregressive code generation language models.
Apache License 2.0
781
stars
208
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Speedup execute.py: Reuse same manager and dict in
#277
michaelfeil
opened
2 days ago
0
trust_remote_code is not been passed to dataset
#276
login256
opened
1 week ago
1
ValueError: Infilling not yet supported for:/Meta-Llama-3.1-8B
#275
kbmlcoding
opened
1 week ago
0
Update `fsspec`
#274
younesbelkada
opened
1 week ago
1
MultiPL-E now supports Dart, OCaml, Elixir, Haskell, Clojure
#273
arjunguha
closed
1 week ago
2
Evaluation result of bigcode/starcoder2-3b on gsm8k_pal does not matched the paper
#272
nongfang55
opened
3 weeks ago
0
Evaluating a Model with a Local Dataset in an Offline Environment
#271
ankush13r
opened
3 weeks ago
8
Any pypi package for this tool?
#270
zhimin-z
opened
3 weeks ago
0
fix typo bug in mbppplus
#269
zkcpku
opened
4 weeks ago
0
Unable to execute the `MultiPL-E` task for `python` language
#268
manthan0227
opened
4 weeks ago
0
ImportError: cannot import name 'SyncManager' from partially initialized module 'multiprocessing.managers'
#267
xinghuang2050
opened
1 month ago
1
What is `fine-tuning` in task submission?
#266
zhimin-z
opened
1 month ago
1
Downgrade pyext dependency
#265
shehrozek-cerebras
closed
1 month ago
1
Error of testing codegeex4-all-9b model
#264
Gumingbro
opened
1 month ago
1
Basecodes
#263
Abhineetsoccer
opened
1 month ago
0
Humaneval and MBPP results of deepseek-6.7b-coder-instruct are lower than offical report of Deepseek team
#262
jessyford
opened
1 month ago
2
HumanEval evaluation results mismatch for Codegemma-2b
#261
berserank
opened
2 months ago
1
Add a new benchmark ENAMEL for evaluating the efficiency of LLM-generated code
#260
q-rz
opened
2 months ago
0
`multiple-cs`, `multiple-go`, and `multiple-d` are broken (`FileNotFound`)
#259
alat-rights
closed
2 months ago
2
[Possibly system specific] Wild (12% vs 20%) run-to-run swings in `multiple-cpp` reported scores
#258
alat-rights
opened
2 months ago
1
Fix Max New Tokens in HF's Generation Config
#257
mostafaelhoushi
opened
2 months ago
2
Need some context for certain args for Instruct Human Eval
#256
teknium1
opened
2 months ago
2
Update Dockerfile-multiple
#255
arjunguha
closed
2 months ago
0
Update MultiPL-E to v3 prompts
#254
arjunguha
closed
2 months ago
0
[draft]save prompts and tests passed/failed
#253
kbmlcoding
closed
2 months ago
0
The evaluation results are inconsistent across different GPUs
#252
DonteFlynn
opened
2 months ago
0
Using the humanevalpack to test the ChatGLM3 model results in an abnormal score.
#251
burger-pb
opened
3 months ago
0
Hi
#250
burger-pb
closed
3 months ago
0
Fix unnecessary repeated overwrite
#249
nielstron
opened
3 months ago
0
Updating SparseML loading to SparseAutoModel
#248
abhinavnmagic
closed
3 months ago
0
Fix: Leaderboard submission Documentation
#247
anil-gurbuz
closed
3 months ago
0
MBPP Llama3-8B-Instruct lower pass@1 score expected
#246
YangZhou08
opened
3 months ago
1
Using custom prompts and postprocessing
#245
anil-gurbuz
opened
3 months ago
1
Adding support for transformers>=4.40.2 to avoid crash with mbpp
#244
meher-m
closed
3 months ago
2
Value error regarding to "--max_length_generation"
#243
aladinggit
closed
3 months ago
1
Following submission documentation fails to save generated model outputs for our(Jdoodle) private HF model
#242
anil-gurbuz
closed
3 months ago
3
fix: rust timeout exception
#241
seobeomjin
closed
3 months ago
1
Some questions about APPS
#240
virt9
closed
3 months ago
1
Changes
#239
meher-m
closed
4 months ago
0
Add a new dataset Mercury
#238
Elfsong
closed
4 months ago
2
How can I pass num_beams?
#237
Sunt-ing
closed
3 months ago
2
Fix MBPP bug with transformers 4.38+
#236
edgan8
closed
3 months ago
6
Run 70b evaluation.
#235
icoderzqliu
closed
4 months ago
2
API-based evaluation support (humanevalpack_openai.py is too old)
#234
s-natsubori
opened
4 months ago
0
The result of llama2-13b-chat(pass@1 18) is worse than paper(pass@1 37)
#233
moyi-qwq
closed
4 months ago
2
Add package name and enforce python version in `setup.py`
#232
shehrozek-cerebras
closed
5 months ago
4
Make `bigcode_eval` pip installable
#231
shehrozek-cerebras
closed
3 months ago
1
If I want to add my own designed prompts before each question, how should I modify the code
#230
ALLISWELL8
opened
5 months ago
1
Add StudentEval from LLM4Code 2024
#229
arjunguha
closed
5 months ago
0
The results of Llama3-8b pass@1 is worse than report
#228
shuaiwang2022
opened
5 months ago
6
Next