Stability-AI lm-evaluation-harness issues

Stability-AI / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

MIT License

143 stars 47 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

How to add llama3-chat-template into this code?

#121 hnhoangdz opened 3 months ago
0
Please consider providing bibliographic information

#120 s-mizuki-nlp opened 8 months ago
1
Fireworks model support

#119 divchenko closed 1 month ago
0
feat: load in 4bit.

#118 ryan-minato opened 9 months ago
0
Update base.py

#117 dakotamahan-stability closed 9 months ago
1
Automate registry registration for tasks

#116 polm-stability closed 9 months ago
0
Change default Llama2 prompt to Japananese

#115 polm-stability closed 10 months ago
0
Set up the ability to run eval suites

#114 polm-stability closed 9 months ago
3
Compare jcommonsense qa prompts with question first vs last

#113 kumapo opened 10 months ago
0
need additional_special_tokens argument for HFLM initializer

#112 kumapo opened 10 months ago
0
Add option to do stratified sampling for few-shot examples

#111 mrorii closed 10 months ago
0
Use Japanese prompt instead of English for JNLI

#110 mrorii closed 10 months ago
0
Use stratified sampling for few-shots

#109 mrorii closed 10 months ago
1
JRD-95 Implement "string contains" approach for jaqket_v2 and jsquad

#108 mrorii closed 10 months ago
2
preferred templates

#107 kumapo closed 10 months ago
0
Fix llama 2 urls in README.md

#106 upura opened 11 months ago
0
fix a bug for prompt version `0.6` in jaqket_v2

#105 mkshing closed 10 months ago
3
Add prompt version `0.2.1` for JCommonsenseQA

#104 mkshing closed 10 months ago
1
Argparse Refactor

#103 polm-stability closed 10 months ago
2
JSQuAD results of LLaMA 2 models

#102 ikuyamada closed 9 months ago
9
fix bug on `mgsm` for prompt version `0.3`

#101 mkshing closed 11 months ago
1
add llama2 format

#100 mkshing closed 11 months ago
2
won't need llama2/llama2-2.7b due to duplication

#99 kumapo closed 11 months ago
1
Add autoGPTQ installation instructions

#98 webbigdata-jp closed 11 months ago
2
Add option to get response for balanced multiple choice questions

#97 polm-stability closed 11 months ago
4
Fix Linter Related Issues

#96 polm-stability closed 11 months ago
1
Add Balanced Accuracy

#95 polm-stability closed 11 months ago
1
Add evaluation harness for 4 new base models

#94 mrorii closed 11 months ago
0
Add JCoLA task

#93 kumapo closed 11 months ago
10
Verbose output for more tasks

#92 polm-stability closed 11 months ago
0
xwinograd is missing

#91 effendijohanes closed 11 months ago
2
Clarification about JSQuAD

#90 sedrick-keh-tri closed 11 months ago
1
Prompt versions of non-instruction-tuned LLaMA models

#89 ikuyamada opened 1 year ago
1
WIP: add meta-llama/Llama-2-70b-hf

#88 fujiki-1emon closed 12 months ago
0
Add gptq support

#87 webbigdata-jp closed 11 months ago
0
add evaluation results for rinna/japanese-gpt-neox-small

#86 CommonGardeniaM closed 11 months ago
0
add evaluation results for weblab-10b models

#85 kojima-takeshi188 opened 1 year ago
0
compare results between Jsquad prompt with title and without title

#84 kumapo closed 11 months ago
8
llama2 70B cause OOM

#83 congdamaS opened 1 year ago
4
Set pad_token_id for hugging face model to suppress warning

#82 kishida opened 1 year ago
1
add results for line-corporation large models

#81 kumapo closed 11 months ago
1
Fix long prompt handling in mgsm

#80 polm-stability closed 1 year ago
0
make dir only if directory is specified in 'output_path'

#79 kishida closed 11 months ago
0
Add configs for testing harness

#78 polm-stability closed 1 year ago
0
Add 8-task scores for rinna bilingual models

#77 mkshing closed 1 year ago
2
Fix Japanese requirements

#76 polm-stability closed 1 year ago
0
add results from `polylm`

#75 fujiki-1emon opened 1 year ago
0
Add 4 new evaluation tasks for 4 JP models

#74 mrorii closed 1 year ago
0
Add 7b jav2 700b result

#73 leemengtw closed 1 year ago
0
Add 4 new evaluation tasks for 6 JP models

#72 mrorii closed 1 year ago
0