I am confused about adapting GeoDL to iCaRL.
iCaRL minimizes the dissimilarity between predictions of the old model and the new model, the loss function is $L{ce} + L{KLDiv}(y{t},y{t-1})$.
And GeoDL minimizes the dissimilarity between latent features.
In the main text: "GeoDL improves the basic iCaRL method (without knowledge distillation) by 8%, 13%, and 15% for 5, 10, and 25 tasks, respectively.", does this mean the loss function of iCaRL+DeoDL is $L{ce} + L{DeoDL}(z{t},z{t-1})$?
Hi! chrysts. Thanks for your excellent work.
I am confused about adapting GeoDL to iCaRL. iCaRL minimizes the dissimilarity between predictions of the old model and the new model, the loss function is $L{ce} + L{KLDiv}(y{t},y{t-1})$. And GeoDL minimizes the dissimilarity between latent features. In the main text: "GeoDL improves the basic iCaRL method (without knowledge distillation) by 8%, 13%, and 15% for 5, 10, and 25 tasks, respectively.", does this mean the loss function of iCaRL+DeoDL is $L{ce} + L{DeoDL}(z{t},z{t-1})$?