EleutherAI / concept-erasure

Erasing concepts from neural representations with provable guarantees
MIT License
208 stars 15 forks source link

Sample Code for Simple Use Case? #1

Closed nickmitchko closed 1 year ago

nickmitchko commented 1 year ago

Hi, would it be possible to provide a very simply sample to patch a llama model removing a specific singular text concept from the model? The sample provided on the README is slightly confusing

norabelrose commented 1 year ago

So far we haven't done concept scrubbing experiments beyond the part-of-speech ones from the paper, so we'd be excited to see you or others try this out!

nickmitchko commented 1 year ago

Got it, I'll take a look and see if it's possible

norabelrose commented 1 year ago

Yeah I think the trickiest part might be getting it to work for concepts with sparse labels and/or weak supervision, where you don't have a label for each token. Something like this might help