EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
186 stars 33 forks source link

"None" ensembling for classfication accuracy #290

Closed derpyplops closed 1 year ago

derpyplops commented 1 year ago

Closes NOT-372

ATM, accuracy for "none" ensembling == "partial" ensembling.

This PR implements a reasonable interpretation of what "No ensembling" would look like for classification accuracy: i.e. for accuracy and calibrated accuracy, use the positive hiddens for inference. I also added logging for cal_thresh.

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.