EleutherAI / elk

Keeping language models honest by directly eliciting knowledge encoded in their activations.
MIT License
175 stars 32 forks source link

Use concept-erasure implementation of LEACE and SAL #252

Closed norabelrose closed 1 year ago

norabelrose commented 1 year ago

Now that concept-erasure is on PyPI, we can outsource our ConceptEraser implementation to that repo.

This PR makes LEACE, rather than SAL, the default method for pseudolabel and prompt template normalization. I should probably add a config option to change it though.

artkpv commented 12 months ago

JFI, my probes / reporters now won't load with this PR because I used Reporter.load. https://github.com/EleutherAI/elk/pull/252/files#diff-d08b84a509f043deeb98c9c642f692fffbd1967486738d2ff242b7897eb0b1ae

norabelrose commented 12 months ago

JFI, my probes / reporters now won't load with this PR because I used Reporter.load. https://github.com/EleutherAI/elk/pull/252/files#diff-d08b84a509f043deeb98c9c642f692fffbd1967486738d2ff242b7897eb0b1ae

Sorry about that, we can't really guarantee backward compatibility at this point. You should be able to load the reporters with an older commit and extract the raw weights if necessary.