ICML 2023 | Parameter-Level Soft-Masking for Continual Learning

ChufanSuki commented 5 months ago

Introduction

CF

Regularization-based：computes importance values of either parameters or their gradients on previous tasks, and adds a regularization in the loss to restrict changes to those important parameters

EWC： Fisher information matrix to represent the importance of parameters
SI: Extend EWC to reduce the complexity in computing the penalty

Regularization-based methods have difficulty to prevent CF

Memory-based: a small memory buffer to store data of previous tasks and replay them in learning a new task to prevent CF

save replay data
generate pseudo-replay data

Parameter isolation: learn to mask a sub-network for each task in a shared network. E.g. HAT, SupSup

poor KT

ChufanSuki commented 5 months ago

Soft-masking of Parameter-level Gradient flow

the importance of a parameter to a task is computed based on its gradient

ChufanSuki / read-paper-and-code

ICML 2023 | Parameter-Level Soft-Masking for Continual Learning #75

Introduction

CF

Soft-masking of Parameter-level Gradient flow