EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.
https://www.eleuther.ai
MIT License
6.39k stars 1.69k forks source link

Support for sequence tagging tasks #1675

Open Khalid-Nabigh opened 5 months ago

Khalid-Nabigh commented 5 months ago

We are trying to evaluate Named Entity Recognition and Part of Speech tagging tasks, but it is unclear to us how to do that. We've noticed that aclue include a Named Entity Recognition task but it is treated as a multiple choice not a sequence tagging task.

haileyschoelkopf commented 5 months ago

Hi!

If I'm remembering right, the question of supporting sequence labeling tasks has come up before--The primary reason we haven't added any is that we're focusing on prompted evaluation of autoregressive LMs, and I'm not aware of a popular + effective framing of NER or POS tagging tasks as a prompted evaluation.

Maybe there have been proposals recently for this though that have gotten traction, if so we'd be happy to accept additions of such tasks!

areias commented 4 months ago

Hello, I've been working on using llms for NLP tasks like NER and relationship extraction and would be interested in adding these tasks and datasets to the harness. You can see here several papers and datasets on this:

What do you think?