Check list for evaluation pipeline

1. Automatic Evaluation Thai Tasks:

[x] XSquad (Multilingual)
[x] Hellaswaq (Translated to Thai)
[ ] SuperGlue (Translated to Thai)
- [ ] copa
- [x] Record
- [x] MultiRC
- [x] RTE (Added)

Convert all tasks above into huggingface dataset format and save to Lanta
Translate all English tasks to Thai. Use mUSE or another encoder to improve translation results if necessary
Create a prompt template using Prompt Source for all tasks listed above
Write a script to evaluate all tasks (Huggingface evaluate library should help) and open PR
- [ ] Zero Shot
- [ ] Few Shot The Script should support input as 4.1 Huggingface PretrainedCasualModel name or path 4.2 Dataset name (from the list) to be evaluated Reference: https://github.com/EleutherAI/lm-evaluation-harness

Use lm-evaluation-harness and try to evaluate model on Huggingface dashboard
Make sure lm-evaluation-harness can run with private Huggingface model (the model in huggingface format but not yet uploaded into huggingface hub)