chrisliu298 / awesome-llm-unlearning

A resource repository for machine unlearning in large language models
Apache License 2.0
75 stars 3 forks source link

About the difference between llm unlearning and safety alignment or safety fine-tuning #1

Closed lucasliunju closed 1 month ago

lucasliunju commented 2 months ago

Hi,

Thank you very much for your great repo. That really helped me quickly learn the most related work about LLM unlearning. After that, I would like to ask the main difference between the LLM unlearning and safety fine-tuning.

I understand my question is not related to your repo. I am just considering this question. It would be better if you could provide some suggestions.

chrisliu298 commented 2 months ago

Hi @lucasliunju. Thanks for your question.

  1. Regarding LLM unlearning, I would encourage taking a look at the definition in Section 3 of [1] and this post on Alignment Forum. Informally, our goal is to remove specific knowledge, behaviors, or capabilities from an LLM. The resulting model should behave as if it has never encountered the data related to the unlearning target. The original definition of machine unlearning in [2] also reflects this idea.
  2. I am not familiar with the term "safety fine-tuning." Do you have a source? I personally consider unlearning as a single branch of methods under the general field of "Alignment," as our goal is to reduce inherent model hazards (in this case the unlearning targets) in specific contexts.

[1] Liu et al. 2024 Rethinking Machine Unlearning for Large Language Models [2] Nguyen et al. 2022 A Survey of Machine Unlearning