bigscience-workshop / t-zero

Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)
Apache License 2.0
456 stars 53 forks source link

"Rank classification" in evaluation for multiple choices #42

Open yuchenlin opened 1 year ago

yuchenlin commented 1 year ago

Hi,

Thanks for the repo! I was wondering if you would please point out which lines of code are for the "rank classification" idea used for evaluating the multiple-choice style tasks?

The paper describes it like this on Page 6:

For tasks that involve choosing the correct completion from several options (e.g. multiple choice question answering), we follow Brown et al. (2020) and use rank classification to evaluate our model: we compute the log-likelihood of each of the target options under the fine-tuned model and select the option with the highest log-likelihood as the prediction. For simplicity, we do not apply length normalization to the log-likelihoods of the target options.

Thank you!

yuchenlin commented 1 year ago

Ah I think I found it here in the forward function of the customized EncoderDecoderModel class: https://github.com/bigscience-workshop/t-zero/blob/25c0761427f3894a8ec5a062a075b96037fb1492/t0/model.py#L56

However, I was wondering if you would please help give a short tutorial that how we can use the same idea to easily evaluate other LMs (say a fine-tuned BART) to make sure the comparisons are fair.