About the difference between llm unlearning and safety alignment or safety fine-tuning

Hi @lucasliunju. Thanks for your question.

Regarding LLM unlearning, I would encourage taking a look at the definition in Section 3 of [1] and this post on Alignment Forum. Informally, our goal is to remove specific knowledge, behaviors, or capabilities from an LLM. The resulting model should behave as if it has never encountered the data related to the unlearning target. The original definition of machine unlearning in [2] also reflects this idea.
I am not familiar with the term "safety fine-tuning." Do you have a source? I personally consider unlearning as a single branch of methods under the general field of "Alignment," as our goal is to reduce inherent model hazards (in this case the unlearning targets) in specific contexts.

chrisliu298 / awesome-llm-unlearning