Open Skorchekd opened 4 months ago
You can find a notebook for a non-refusal use-case here: https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb
Of course, you'll need to adjust to your needs.
The "refusal" / "harmful" / "harmless" terminology in this library can be seen as whatever behaviors you want to ablate. That is, you want to achieve non-"refusal" responses to the whatever you decide is a "harmful" prompt, but "refusal" is simply what you don't want to see given a prompt. This would require two datasets of polarized/opposite prompts.
Alternatively, as shown in the notebook above, you can use also use special system prompt (see notebook).
Eventually we hope to change the terminology towards a general behavioral-ablation use-case.
Most of this is still very exploratory and, at best, experimental. If you find anything of interest, let us know!
You can find a notebook for a non-refusal use-case here: https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb
Of course, you'll need to adjust to your needs.
The "refusal" / "harmful" / "harmless" terminology in this library can be seen as whatever behaviors you want to ablate. That is, you want to achieve non-"refusal" responses to the whatever you decide is a "harmful" prompt, but "refusal" is simply what you don't want to see given a prompt. This would require two datasets of polarized/opposite prompts.
Alternatively, as shown in the notebook above, you can use also use special system prompt (see notebook).
Eventually we hope to change the terminology towards a general behavioral-ablation use-case.
Most of this is still very exploratory and, at best, experimental. If you find anything of interest, let us know!
doesnt work.... does it need a gpu
perhaps could make an idea where there are configs that could steer the model towards certain things.. for example different personalitys different emotions etc preset into the code?.. just an idea i had... very cool though!