arthurdouillard / incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).
MIT License
383 stars 60 forks source link

question about 2021cvprw paper #62

Closed duboyuan closed 2 years ago

duboyuan commented 2 years ago

Hello, I want to follow this work(Insight From the Future for Continual Learning), but I didn't find the relevant code entry, did I miss it?

arthurdouillard commented 2 years ago

Hi,

Yeah in the code the Ghost model was originally called ZIL: https://github.com/arthurdouillard/incremental_learning.pytorch/blob/master/inclearn/models/zil.py

Note that the code isn't very pretty sorry! Zeroshot models are quite hard to make it work 🤷‍♂️

duboyuan commented 2 years ago

Thank you for your reply. I also think it is difficult, but I want to try. I hope I can communicate with you again

duboyuan commented 2 years ago

Hi, sorry to bother you again, I noticed that the add_classes (self, n_classes) function in Classifiers. py initializes all zero for the full connection layer. But the Lwf algorithm (or something like it) seems to need to save the full connection layer parameters of the previous task. Is it my misunderstanding or did I not notice the relevant implementation?

arthurdouillard commented 2 years ago

Hum, are you talking about the Cosine classifier?

After initializing the new weights to zero, it uses kaiming (https://github.com/arthurdouillard/incremental_learning.pytorch/blob/master/inclearn/lib/network/classifiers.py#L447).

And the old weights are not re-init in that case. Is that clearer?

duboyuan commented 2 years ago

thank you for your reply. I see that in your implementation you are indeed using a cosine classifier, re-initialized on each task. But I see that the original paper of lwf uses the standard MLP layer as the classification layer, so I see some other implementations, a typical reality is that the last layer of nodes is defined as 100 when the process of defining the network (for CIFAR100 ), but other task-related nodes are masked when calculating loss. In my tests, I found some differences in the results between the two. my question is

  1. Would reinitialization be a problem for algorithms that don't save samples?
  2. Will the two different implementations have a greater impact on the results, like the one I experimented with?