OpenGPTX / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
10 stars 8 forks source link

XStannce Topic Classification #43

Open malteos opened 2 years ago

malteos commented 2 years ago

Use xstance benchmark for topic classification instead of stance detection.

See https://github.com/OpenGPTX/lm-evaluation-harness/blob/master/lm_eval/tasks/x_stance.py

aishwaryaanegundi commented 1 year ago

Evaluation of topic classification task on XStance dataset

Zero-shot Results

From literature

No works found on topic classification for XStance dataset

From eval harness

Model: gpt2

With only question in prompt

Task Version Metric Value Stderr
xstance_tc 0 acc 0.1
precision 0.0303
recall 0.0909
f1 0.0455

With both question and comment included in the prompt

Task Version Metric Value Stderr
xstance_tc 0 acc 0.1
precision 0.0227
recall 0.0909
f1 0.0364

Comments

XStance dataset consists of question and comment. The performance is slightly better with just question included in the prompt