issues
search
bigscience-workshop
/
evaluation
Code and Data for Evaluation WG
Other
41
stars
24
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
build: sync poetry, setup.py, and requirements with safer deps
#82
tianjianjiang
opened
2 years ago
0
adding arabic sentences for the shade dataset
#81
maraimm
opened
2 years ago
0
adding shades_nationality_hi.csv
#80
shanyas10
opened
2 years ago
0
Robustly bump dependency versions
#79
jaketae
closed
2 years ago
2
bias-shades dataset
#78
manandey
closed
2 years ago
0
Restructuring french dictionary and adding more.
#77
mmitchellai
closed
2 years ago
0
Adding WebNLG dataset (GEM benchmark)
#76
jordiclive
closed
2 years ago
3
Adding WebNLG dataset (GEM benchmark)
#75
jordiclive
closed
2 years ago
0
fix: update the version of transformers to support in evaluating gpt-neo models
#74
YU-Anthony
opened
2 years ago
1
Add MKQA to Full Benchmark
#73
shayne-longpre
opened
2 years ago
0
Add BLiMP task
#72
jumelet
opened
2 years ago
0
Add ANLI dataset
#71
omerant
opened
2 years ago
0
Add CrowS-Pairs task
#70
oskarvanderwal
opened
3 years ago
0
Add HANS dataset
#69
aakanksha19
opened
3 years ago
0
feat: Add LAMA TREX task
#68
JanKalo
closed
2 years ago
0
add jigsaw_toxicity_pred data
#67
trishalaneeraj
closed
3 years ago
2
data path for custom datasets
#66
debajyotidatta
closed
3 years ago
0
Add XQuAD and PIAF datasets
#65
ryanzhumich
closed
3 years ago
0
Refactor task template to merge multilingual.json and english.json
#64
marinecarpuat
opened
3 years ago
0
README cleanup + CONTRIBUTING.md
#63
wilsonyhlee
closed
3 years ago
3
feat: use promptsource templates
#62
tianjianjiang
opened
3 years ago
0
Wrap evaluation benchmark using HF-trainer
#61
sbmaruf
opened
3 years ago
2
Add basic template
#60
jaketae
closed
3 years ago
1
Add PIQA dataset
#59
tttyuntian
closed
3 years ago
0
Add WMT dataset
#58
jaketae
closed
3 years ago
5
Setup testing
#57
jaketae
opened
3 years ago
2
build: apply a typical project boilerplate of BigScience repo
#56
tianjianjiang
closed
3 years ago
1
Modify `AutoTask` + adapt `TyDiQA`
#55
jaketae
closed
3 years ago
1
Start overleaf for benchmark tech report
#54
epavlick
opened
3 years ago
1
Refactor evaluation pipeline
#53
jaketae
closed
3 years ago
4
Refactor overall directory structure
#52
jaketae
closed
3 years ago
6
Add LAMBADA evaluation script
#51
jaketae
closed
3 years ago
0
translate validation prompts into all training languages
#50
epavlick
opened
3 years ago
9
benchmark mt5 on tydiqa prompting setup
#49
epavlick
opened
3 years ago
1
Convert validation code to work with Megatron as well as huggingface
#48
epavlick
opened
3 years ago
0
Add lambada to validation set
#47
epavlick
closed
3 years ago
0
Add PIQA to validation set
#46
epavlick
closed
3 years ago
0
Adding datasets in full benchmark to the README
#45
wilsonyhlee
closed
3 years ago
0
Add TyDi QA to Simple Benchmark
#44
wilsonyhlee
closed
3 years ago
0
Set code conventions
#43
jaketae
closed
3 years ago
3
Add WMT evaluation script
#42
jaketae
closed
3 years ago
1
Add XSum to Full Benchmark
#41
aobaruwa
closed
3 years ago
5
Create toy tasks/dummy code for fine-tuning evals
#40
epavlick
opened
3 years ago
1
Create toy tasks/dummy code for prompt-based evals
#39
epavlick
closed
3 years ago
1
Create Targeted Minimal Pair "Stress-Tests" for Sensitivity to Social Groups
#38
epavlick
opened
3 years ago
2
Add CrowS-Pairs to Full Benchmark
#37
epavlick
opened
3 years ago
5
Add Jigsaw Toxicity Classification to Full Benchmark
#36
epavlick
opened
3 years ago
4
Add WinoMT to Full Benchmark
#35
epavlick
opened
3 years ago
2
Add HANS to Full Benchmark
#34
epavlick
opened
3 years ago
3
Add MNLI to Full Benchmark
#33
epavlick
opened
3 years ago
3
Next