Evaluation for x_stance dataset

Scores on X-Stance dataset

HF dataset : https://huggingface.co/datasets/strombergnlp/x-stance
Task: x_stance_de, x_stance_fr
Metrics: accuracy, precision, recall, F1-score

Results

German:	Model	Source	Accuracy	Precision	Recall
mGPT (--num_fewshot=5)	lm-evaluation-harness	50.56	50.57	50.56	49.94
fastText	paper (German set)	*-	*-	*-	*69.9
M-BERT	paper (German set)	*-	*-	*-	*76.8

French:	Model	Source	Accuracy	Precision	Recall
gpt2	lm-evaluation-harness	47.47	50.49	50.28	42.74
fastText	paper (French set)	*-	*-	*-	*71.2
M-BERT	paper (French set)	*-	*-	*-	*76.6

(*The F1-score for the experiments described in the paper is the macro-average of the F1-scores for ‘favor’ and for ‘against; accuracy, precision and recall are not reported in the original paper)

OpenGPTX / lm-evaluation-harness

Evaluation for x_stance dataset #10

Scores on X-Stance dataset

Results