Closed nickmitchko closed 1 year ago
So far we haven't done concept scrubbing experiments beyond the part-of-speech ones from the paper, so we'd be excited to see you or others try this out!
Got it, I'll take a look and see if it's possible
Yeah I think the trickiest part might be getting it to work for concepts with sparse labels and/or weak supervision, where you don't have a label for each token. Something like this might help
Hi, would it be possible to provide a very simply sample to patch a llama model removing a specific singular text concept from the model? The sample provided on the README is slightly confusing