EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
178 stars 33 forks source link

Use separate CSVs for LR, LM, & reporter eval; support INLP #211

Closed norabelrose closed 1 year ago

norabelrose commented 1 year ago

I wanted to see how many orthogonal supervised classifiers can get above random perf, which is a proxy for the "dimensionality of the truth subspace." To do this I wrote a quick and dirty implementation of Iterative Nullspace Projection (INLP). While INLP was originally proposed for erasing concepts, R-LACE is much better for that purpose. But INLP does make sense if you just want a bunch of orthogonal classifiers!

Depends on #210