Closed IsraelAbebe closed 7 months ago
@IsraelAbebe For evaluating the overall performance of the model in a classification task, it is advisable to employ metrics such as Accuracy and F1-score.
@lewiEyasu yes we will use average f1 score
@IsraelAbebe , I've created a new branch named 'eyasu-eval.' In this branch, I've added functions to clean the responses from both Llama and GPT-4 and also calculated the F1 score."
@Tadesse-Destaw @ymitiku How should we calculate accuracy or f1 score of the results ?
this is just work in progress lets contrbute ideas