issues
search
declare-lab
/
instruct-eval
This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.
https://declare-lab.github.io/instruct-eval/
Apache License 2.0
528
stars
42
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Support for lm_eval v0.4 and higher
#34
Shinning-Zhou
opened
1 month ago
0
Multi GPU Support is required
#33
chintan-ushur
opened
5 months ago
0
Evaluate EncoderDecoderModels
#32
Bachstelze
opened
9 months ago
0
Colab notebook
#31
Bachstelze
opened
9 months ago
0
CRASS
#30
Bachstelze
opened
9 months ago
0
Evaluate on a single 24GB/32GB GPU
#29
lemyx
opened
9 months ago
1
How to submit own model to leaderboard?
#28
timothylimyl
opened
10 months ago
1
[Prompt Template] Silent bug - Performance Killer
#27
timothylimyl
opened
10 months ago
0
What are the metrics for the evaluation results?
#26
zhimin-z
opened
11 months ago
0
Reproduce the accuracy of chavinlo/alpaca-native on MMLU
#25
sglucas
opened
1 year ago
0
Update hhh.py
#24
iampushpdeep
opened
1 year ago
0
modify gitignore and fix the bug when run humaneval
#23
yjw1029
opened
1 year ago
0
Fail to Evaluate Model on human_eval
#22
yjw1029
closed
1 year ago
1
What to do about broken Evals?
#21
damhack
closed
1 year ago
1
HHH Benchmark evaluation question: why using base prompt and (A - A_base) > (B - B_base)?
#20
t170815518
closed
1 year ago
1
Fix a bug in completion extraction
#19
likaixin2000
closed
1 year ago
0
Support for larger batch_size
#18
soumyasanyal
closed
1 year ago
1
C-Eval
#17
duanqiyuan
opened
1 year ago
0
请问能加入对baichuan大模型的支持吗
#16
linghongli
opened
1 year ago
1
add multiple gpu support
#15
lxy444
opened
1 year ago
0
[Feature Request] Saving Prediction Results
#14
guanqun-yang
opened
1 year ago
0
Is there any parallel processing methods?
#13
wwngh1233
opened
1 year ago
0
Add config to save eval results
#12
arthurtobler
opened
1 year ago
0
Future directions
#11
tju01
closed
1 year ago
1
Regarding the comparison to lm-evaluation-harness
#10
gakada
opened
1 year ago
0
执行过程中报错,信息如下
#9
linghongli
closed
1 year ago
1
HHH benchmark
#8
Emrys-Hong
closed
1 year ago
0
Integrate the evaluation in the Transformers trainer with transformers.TrainerCallback
#7
BaohaoLiao
opened
1 year ago
1
AutoModelForCausalLM supports llama models now
#6
passaglia
opened
1 year ago
1
Add License
#5
passaglia
closed
1 year ago
2
Add zero-shot evaluation results
#4
LeeShiyang
opened
1 year ago
1
Can not reproduce results on the table
#3
simplelifetime
opened
1 year ago
7
Prompt format for LLaMa
#2
LeeShiyang
closed
1 year ago
2
Humaneval
#1
Emrys-Hong
closed
1 year ago
0