EleutherAI / project-menu

See the issue board for the current status of active and prospective projects!
65 stars 4 forks source link

[RFP] Qualitative evaluations of mixing language models #24

Closed kingoflolz closed 1 year ago

kingoflolz commented 3 years ago

Background

Scaling laws show that loss follows a smooth plot give compute, data etc, but there is an open question of if loss alone accurately captures the qualitative performance.

What to plot?

Use some pretrained language models (GPT-Neo, GPT-J, GPT2 etc) and create a scatterplot of human evaluation scores vs loss.

Additionally, mix the logits of different models during generation, to match the loss of other models. E.g. mix the logits of GPT-J with GPT-Neo 1.3B in the ratio that would get the same loss as GPT-Neo 2.7B, and compare the human evaluation scores of this mixed model with the original iso-loss model.

Related Papers/Frameworks

N/A