JiazuoYu / MoE-Adapters4CL

Code for paper "Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters" CVPR2024
119 stars 6 forks source link

About CIL #9

Closed boringKey closed 1 month ago

boringKey commented 2 months ago

This is truly an excellent piece of work. However, I have a question for you. In the class incremental setting, how does utilizing only one router combined with two experts address the problem of catastrophic forgetting? Specifically, the experts can only retain knowledge of the current classes and cannot store knowledge of previous classes. I look forward to your explanation.

JiazuoYu commented 1 month ago

Hi, thanks for your attention to our work. For class-incremental learning (CIL) tasks, when only 1 route and 2 experts are used without a frozen activation strategy, repeated learning of network parameters indeed leads to forgetting. However, we were surprised to find that as the number of experts increases, the routing tends to learn specific expert combinations for similar classes. We think it alleviates the parameter changes caused by jointly training classes of significantly different distributions within the same group of experts, thereby alleviating the forgetting phenomenon. This finding also indirectly demonstrates that expert models have some inherent advantages in handling different distribution data and incremental learning tasks.