Joint training results different for different types of incremental learning?

GMvandeVen / continual-learning

PyTorch implementation of various methods for continual learning (XdG, EWC, SI, LwF, FROMP, DGR, BI-R, ER, A-GEM, iCaRL, Generative Classifier) in three different scenarios.

MIT License

1.54k stars 310 forks source link

Joint training results different for different types of incremental learning? #25

Closed toshi2k2 closed 1 year ago

toshi2k2 commented 1 year ago

Isn't joint training defined as training done on all the data at the same time? In that case, shouldn't it be the same for all three scenarios of CL? However, the results from the code (and in the paper) are not the same. Is joint training defined differently?

GMvandeVen commented 1 year ago

Joint training is indeed training on all the data at the same time, but this gives different results for the three continual learning scenarios because in each scenario the network must learn something else. For Split MNIST, the different mappings that the network is supposed to learn are illustrated in Figure 2 in the accompanying article (https://www.nature.com/articles/s42256-022-00568-3#Fig2). Hope this helps!

toshi2k2 commented 1 year ago

So, for joint training (task incremental and domain incremental), the output size is equal to 'within-context' label size (for above example, its 2) and for class incremental its the 'global-label' size which is 10 in the above case. Is my understanding correct?

GMvandeVen commented 1 year ago

For domain- and class-incremental learning that is correct. For task-incremental learning the output size is typically taken to be equal to the 'global-label' size, with the provided context label being used to set only those output units of classes in the current task to 'active' (i.e., to have a multi-head output layer).