chufanchen / read-paper-and-code

0 stars 0 forks source link

NeurIPS 2021 | Achieving Forgetting Prevention and Knowledge Transfer in Continual Learning #118

Open chufanchen opened 6 months ago

chufanchen commented 6 months ago

https://arxiv.org/abs/2112.02706

chufanchen commented 6 months ago

CTR

CL-plugin is a full continual learning module designed to interact with a pre-trained model.

Freeze pre-train model to avoid forgetting is not the best choice.

CTR inserts CL-plugin in two locations of BERT, i.e. in each transformer layer of BERT.

In learning, only the two CL-plugins and the classification heads are trained.

CL-plugin

Inputs: hidden states $h^{(t)} \in \mathbb{R}^{d_t \times d_e}$, task ID $t$

$t$: current task ID $d_t$: # tokens $d_e$: # dimensions

Knowledge Sharing Module(KSM) and Task Specific Module(TSM)

KSM

Task Capsule Layer(TK-Layer): it prepares the low-level features derived from each task

Capsule: 2-layer fully-connected network $f_i(\cdot)=MLP_i(\cdot)$

Each capsule represents a task. Assume we have learned $t$ task so far, the capsule for task $i \leq k$ is $p_i^{(t)}=f_i(h^{(t)})$.

Transfer Routing
Pre-route Vector Generator (PVG)

$u{j \vert i}^{(t)}=W{ij}pi^{(t)}$, $W{ij} \in \mathbb{R}^{d_s \times d_k}$.

$d_s$ and $d_k$ are dimensions of task capsule $i$ and transfer capsule $j$

Similarity Estimator(SE)