Closed DavideA closed 5 years ago
Thanks for your comment. Yes, you are right, this is indeed a discrepancy between the code and our method-description in the paper. According to the method-description in our paper, the line of code that you refer to should indeed be left out.
It thus seems odd that this line of code ended up there. When I went back to the original paper proposing “online EWC” (https://arxiv.org/pdf/1805.06370.pdf) I realized that in their method-description they actually do include this additional scaling by gamma (Eq (8) in section 4) when computing the regularization loss.
Note that because this additional scaling by gamma has the same effect as the hyper-parameter lambda, whether or not this additional scaling is included should actually not have any real effect on the final result when lambda is selected by a hyperparameter search. (It should just result in a slightly different selected value for lambda.)
Thank you for pointing this out to me! Sorry for the confusion, hope this helped to clear it up. In the next update I will one way or the other make things consistent.
Thank you,
I now understand where it comes from.
Closing. Best, D
Hi, and thank you for your work.
I have a doubt about the implementation of Online EWC. Specifically, I refer to the following line of code. https://github.com/GMvandeVen/continual-learning/blob/ff0e03cb913ac0dea4fc59058968b1e6784decfd/continual_learner.py#L161
To the best of my understanding, gamma is decay applied to prior Fisher matrices when updating its estimate, but shouldn't affect how the regularization loss is computed (Eq. (7) in Sec A.2.2).
Could you please provide intuition on this matter? Best, D