Fit reporters on input embeddings as a sanity check

norabelrose commented 1 year ago

I noticed that on some (model, dataset) pairs the VINC and/or logistic regression AUROC is rather high after the very first layer, which seemed moderately implausible to me. It struck me that we can fit reporters to the input embeddings to sanity check our results. The idea is that if the input embedding AUROC is significantly higher than 0.5 there must be something wrong with the code or the prompt templates or both, since you can't classify a statement as true or false only by looking at its very last token and nothing else.

CLAassistant commented 1 year ago

All committers have signed the CLA.

thejaminator commented 1 year ago

lgtm!

EleutherAI / elk

Fit reporters on input embeddings as a sanity check #209