bigscience-workshop evaluation issues

bigscience-workshop / evaluation

Code and Data for Evaluation WG

Other

41 stars 24 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

build: sync poetry, setup.py, and requirements with safer deps

#82 tianjianjiang opened 2 years ago
0
adding arabic sentences for the shade dataset

#81 maraimm opened 2 years ago
0
adding shades_nationality_hi.csv

#80 shanyas10 opened 2 years ago
0
Robustly bump dependency versions

#79 jaketae closed 2 years ago
2
bias-shades dataset

#78 manandey closed 2 years ago
0
Restructuring french dictionary and adding more.

#77 mmitchellai closed 2 years ago
0
Adding WebNLG dataset (GEM benchmark)

#76 jordiclive closed 2 years ago
3
Adding WebNLG dataset (GEM benchmark)

#75 jordiclive closed 2 years ago
0
fix: update the version of transformers to support in evaluating gpt-neo models

#74 YU-Anthony opened 2 years ago
1
Add MKQA to Full Benchmark

#73 shayne-longpre opened 2 years ago
0
Add BLiMP task

#72 jumelet opened 2 years ago
0
Add ANLI dataset

#71 omerant opened 2 years ago
0
Add CrowS-Pairs task

#70 oskarvanderwal opened 3 years ago
0
Add HANS dataset

#69 aakanksha19 opened 3 years ago
0
feat: Add LAMA TREX task

#68 JanKalo closed 2 years ago
0
add jigsaw_toxicity_pred data

#67 trishalaneeraj closed 3 years ago
2
data path for custom datasets

#66 debajyotidatta closed 3 years ago
0
Add XQuAD and PIAF datasets

#65 ryanzhumich closed 3 years ago
0
Refactor task template to merge multilingual.json and english.json

#64 marinecarpuat opened 3 years ago
0
README cleanup + CONTRIBUTING.md

#63 wilsonyhlee closed 3 years ago
3
feat: use promptsource templates

#62 tianjianjiang opened 3 years ago
0
Wrap evaluation benchmark using HF-trainer

#61 sbmaruf opened 3 years ago
2
Add basic template

#60 jaketae closed 3 years ago
1
Add PIQA dataset

#59 tttyuntian closed 3 years ago
0
Add WMT dataset

#58 jaketae closed 3 years ago
5
Setup testing

#57 jaketae opened 3 years ago
2
build: apply a typical project boilerplate of BigScience repo

#56 tianjianjiang closed 3 years ago
1
Modify `AutoTask` + adapt `TyDiQA`

#55 jaketae closed 3 years ago
1
Start overleaf for benchmark tech report

#54 epavlick opened 3 years ago
1
Refactor evaluation pipeline

#53 jaketae closed 3 years ago
4
Refactor overall directory structure

#52 jaketae closed 3 years ago
6
Add LAMBADA evaluation script

#51 jaketae closed 3 years ago
0
translate validation prompts into all training languages

#50 epavlick opened 3 years ago
9
benchmark mt5 on tydiqa prompting setup

#49 epavlick opened 3 years ago
1
Convert validation code to work with Megatron as well as huggingface

#48 epavlick opened 3 years ago
0
Add lambada to validation set

#47 epavlick closed 3 years ago
0
Add PIQA to validation set

#46 epavlick closed 3 years ago
0
Adding datasets in full benchmark to the README

#45 wilsonyhlee closed 3 years ago
0
Add TyDi QA to Simple Benchmark

#44 wilsonyhlee closed 3 years ago
0
Set code conventions

#43 jaketae closed 3 years ago
3
Add WMT evaluation script

#42 jaketae closed 3 years ago
1
Add XSum to Full Benchmark

#41 aobaruwa closed 3 years ago
5
Create toy tasks/dummy code for fine-tuning evals

#40 epavlick opened 3 years ago
1
Create toy tasks/dummy code for prompt-based evals

#39 epavlick closed 3 years ago
1
Create Targeted Minimal Pair "Stress-Tests" for Sensitivity to Social Groups

#38 epavlick opened 3 years ago
2
Add CrowS-Pairs to Full Benchmark

#37 epavlick opened 3 years ago
5
Add Jigsaw Toxicity Classification to Full Benchmark

#36 epavlick opened 3 years ago
4
Add WinoMT to Full Benchmark

#35 epavlick opened 3 years ago
2
Add HANS to Full Benchmark

#34 epavlick opened 3 years ago
3
Add MNLI to Full Benchmark

#33 epavlick opened 3 years ago
3