Stability-AI / lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.
MIT License
145 stars 47 forks source link

Clarification about JSQuAD #90

Closed sedrick-keh-tri closed 1 year ago

sedrick-keh-tri commented 1 year ago

Thanks for your great work on this leaderboard!

I was wondering if the reported scores for JSQuAD on the README are F1 scores or exact match scores.

mkshing commented 1 year ago

It's exact match.