chufanchen / read-paper-and-code

0 stars 0 forks source link

NeurIPS 2023 | Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality #102

Closed chufanchen closed 7 months ago

chufanchen commented 7 months ago

https://arxiv.org/abs/2310.07234

https://github.com/thu-ml/HiDe-Prompt

chufanchen commented 7 months ago

Motivation

Previous prompt-based continual learning methods' performance degrade under self-supervised pre-training.

Hierarchical Decomposition of Continual Learning Objective

P\left(\boldsymbol{x} \in \mathcal{X}_{\bar{i}, \bar{j}} \mid \mathcal{D}, \theta\right) \longrightarrow \max \left[P\left(\boldsymbol{x} \in \mathcal{X}_{\bar{i}, \bar{j}} \mid \mathcal{D}, \theta\right), P\left(\boldsymbol{x} \in \mathcal{X}^y \mid \mathcal{D}, \theta\right)\right]

Within-Task Prediction (WTP) $H{\mathrm{WTP}}(\boldsymbol{x})=\mathcal{H}(\mathbf{1}{\bar{j}},{P(\boldsymbol{x} \in \mathcal{X}{\bar{i},j} \vert \boldsymbol{x} \in \mathcal{X}{\bar{i}}, \mathcal{D}, \theta)}_j)$

Task-Identity Inference (TII) $H{\text {TII }}(\boldsymbol{x})=\mathcal{H}(\mathbf{1}{\bar{i}},{P(\boldsymbol{x} \in \mathcal{X}_i \mid \mathcal{D}, \theta)}_i)$

Task-Adaptive Prediction (TAP) $H{\mathrm{TAP}}(\boldsymbol{x})=\mathcal{H}(\mathbf{1}{\bar{c}},{P(\boldsymbol{x} \in \mathcal{X}^c \mid \mathcal{D}, \theta)}_c)$

image