declare-lab instruct-eval issues

declare-lab / instruct-eval

This repository contains code to quantitatively evaluate instruction-tuned models such as Alpaca and Flan-T5 on held-out tasks.

https://declare-lab.github.io/instruct-eval/

Apache License 2.0

528 stars 42 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Support for lm_eval v0.4 and higher

#34 Shinning-Zhou opened 1 month ago
0
Multi GPU Support is required

#33 chintan-ushur opened 5 months ago
0
Evaluate EncoderDecoderModels

#32 Bachstelze opened 9 months ago
0
Colab notebook

#31 Bachstelze opened 9 months ago
0
CRASS

#30 Bachstelze opened 9 months ago
0
Evaluate on a single 24GB/32GB GPU

#29 lemyx opened 9 months ago
1
How to submit own model to leaderboard?

#28 timothylimyl opened 10 months ago
1
[Prompt Template] Silent bug - Performance Killer

#27 timothylimyl opened 10 months ago
0
What are the metrics for the evaluation results?

#26 zhimin-z opened 11 months ago
0
Reproduce the accuracy of chavinlo/alpaca-native on MMLU

#25 sglucas opened 1 year ago
0
Update hhh.py

#24 iampushpdeep opened 1 year ago
0
modify gitignore and fix the bug when run humaneval

#23 yjw1029 opened 1 year ago
0
Fail to Evaluate Model on human_eval

#22 yjw1029 closed 1 year ago
1
What to do about broken Evals?

#21 damhack closed 1 year ago
1
HHH Benchmark evaluation question: why using base prompt and (A - A_base) > (B - B_base)?

#20 t170815518 closed 1 year ago
1
Fix a bug in completion extraction

#19 likaixin2000 closed 1 year ago
0
Support for larger batch_size

#18 soumyasanyal closed 1 year ago
1
C-Eval

#17 duanqiyuan opened 1 year ago
0
请问能加入对baichuan大模型的支持吗

#16 linghongli opened 1 year ago
1
add multiple gpu support

#15 lxy444 opened 1 year ago
0
[Feature Request] Saving Prediction Results

#14 guanqun-yang opened 1 year ago
0
Is there any parallel processing methods?

#13 wwngh1233 opened 1 year ago
0
Add config to save eval results

#12 arthurtobler opened 1 year ago
0
Future directions

#11 tju01 closed 1 year ago
1
Regarding the comparison to lm-evaluation-harness

#10 gakada opened 1 year ago
0
执行过程中报错，信息如下

#9 linghongli closed 1 year ago
1
HHH benchmark

#8 Emrys-Hong closed 1 year ago
0
Integrate the evaluation in the Transformers trainer with transformers.TrainerCallback

#7 BaohaoLiao opened 1 year ago
1
AutoModelForCausalLM supports llama models now

#6 passaglia opened 1 year ago
1
Add License

#5 passaglia closed 1 year ago
2
Add zero-shot evaluation results

#4 LeeShiyang opened 1 year ago
1
Can not reproduce results on the table

#3 simplelifetime opened 1 year ago
7
Prompt format for LLaMa

#2 LeeShiyang closed 1 year ago
2
Humaneval

#1 Emrys-Hong closed 1 year ago
0