Running Eval Task by each model - Githubissues

OpenThaiGPT / openthaigpt-pretraining

Apache License 2.0

21 stars 10 forks source link

Running Eval Task by each model #271

Open ArthurMinovsky opened 1 year ago

ArthurMinovsky commented 1 year ago

Test zero-shot and few shot (5 shots) in each dataset in the notion

To do

[ ] Test eval pipeline with once model
[ ] Run batch.sh in Lanta by 1 model 1 GPU config
[ ] Run batch.sh in Lanta for the added model by New

ArthurMinovsky commented 1 year ago

Thai dataset

[wisesight_sentiment](https://huggingface.co/datasets/wisesight_sentiment/viewer/wisesight_sentiment/train?p=1) → sentiment analysis
[lst20](https://huggingface.co/datasets/lst20) → NER
[xquad.th](https://huggingface.co/datasets/xquad/viewer/xquad.th/validation) → QA
[xcopa](https://huggingface.co/datasets/xcopa) → causal reasoning task

ArthurMinovsky commented 1 year ago

Huggingface dataset (filtered)

filtered dataset from mUSE score

[Patt/HellaSwag_TH_drop](https://huggingface.co/datasets/Patt/HellaSwag_TH_drop)

→ The row that any score < 0.5 was dropped

[**Patt/MultiRC_TH_drop](https://huggingface.co/datasets/Patt/MultiRC_TH_drop)**

→ The row that any score < 0.66 was dropped.

[**Patt/RTE_TH_drop](https://huggingface.co/datasets/Patt/RTE_TH_drop)**

→ The row which score_hypothesis <= 0.5 or score_premise <= 0.7 was dropped.

[Patt/ReCoRD_TH_drop](https://huggingface.co/datasets/Patt/ReCoRD_TH_drop)

→ Drop every row that score_answers < 0.8 and every row that score < 0.5 after penalty.

Pattptr commented 1 year ago

Dataset checklist

Thai

[x] wisesight_sentiment
[ ] lst20
[x] xquad.th
[x] xcopa -> available on EleutherAI/lm-eval
[x] thaisum

EN -> TH dataset (wait for quality checking)