EthioNLP / afri-sft-data

This repository generates instruction tuning dataset from different datasets.
6 stars 1 forks source link

calculating evaluation scores #13

Closed IsraelAbebe closed 7 months ago

IsraelAbebe commented 10 months ago

@Tadesse-Destaw @ymitiku How should we calculate accuracy or f1 score of the results ?

this is just work in progress lets contrbute ideas

EyasuShiferaw commented 10 months ago

@IsraelAbebe For evaluating the overall performance of the model in a classification task, it is advisable to employ metrics such as Accuracy and F1-score.

IsraelAbebe commented 10 months ago

@lewiEyasu yes we will use average f1 score

EyasuShiferaw commented 10 months ago

@IsraelAbebe , I've created a new branch named 'eyasu-eval.' In this branch, I've added functions to clean the responses from both Llama and GPT-4 and also calculated the F1 score."