[ ] RTE (Recognizing Textual Entailment) (Translated to Thai)
Summarization
[ ] ThaiSum
Translation
[ ] Please consult with AJ Peerachet to select task for evaluation
Token Classification
[ ] LST20 (POS, NER, Sentence, Clause boundary)
Expected outcome
Convert all tasks above into huggingface dataset format and save to Lanta
Translate all English tasks to Thai. Use mUSE or another encoder to improve translation results if necessary
Create a prompt template using Prompt Source for all tasks listed above
Write a script to evaluate all tasks (Huggingface evaluate library should help) and open PR
[ ] Zero Shot
[ ] Few Shot
The Script should support input as
4.1 Huggingface PretrainedCasualModel name or path
4.2 Dataset name (from the list) to be evaluated
Reference: https://github.com/EleutherAI/lm-evaluation-harness
Check list for evaluation pipeline
1. Automatic Evaluation Thai Tasks:
Sequence Classification
QA
Classification
Summarization
Translation
Token Classification
Expected outcome
2. Automatic Evaluation English Tasks