"Rank classification" in evaluation for multiple choices

bigscience-workshop / t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

Apache License 2.0

456 stars 53 forks source link

Hi,

Thanks for the repo! I was wondering if you would please point out which lines of code are for the "rank classification" idea used for evaluating the multiple-choice style tasks?

The paper describes it like this on Page 6:

For tasks that involve choosing the correct completion from several options (e.g. multiple choice question answering), we follow Brown et al. (2020) and use rank classification to evaluate our model: we compute the log-likelihood of each of the target options under the fine-tuned model and select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply length normalization to the log-likelihoods of the target options.

Thank you!

bigscience-workshop / t-zero

"Rank classification" in evaluation for multiple choices #42