Thank you very much for your great repo. That really helped me quickly learn the most related work about LLM unlearning. After that, I would like to ask the main difference between the LLM unlearning and safety fine-tuning.
I understand my question is not related to your repo. I am just considering this question. It would be better if you could provide some suggestions.
Regarding LLM unlearning, I would encourage taking a look at the definition in Section 3 of [1] and this post on Alignment Forum. Informally, our goal is to remove specific knowledge, behaviors, or capabilities from an LLM. The resulting model should behave as if it has never encountered the data related to the unlearning target. The original definition of machine unlearning in [2] also reflects this idea.
I am not familiar with the term "safety fine-tuning." Do you have a source? I personally consider unlearning as a single branch of methods under the general field of "Alignment," as our goal is to reduce inherent model hazards (in this case the unlearning targets) in specific contexts.
Hi,
Thank you very much for your great repo. That really helped me quickly learn the most related work about LLM unlearning. After that, I would like to ask the main difference between the LLM unlearning and safety fine-tuning.
I understand my question is not related to your repo. I am just considering this question. It would be better if you could provide some suggestions.