ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
http://avalanche.continualai.org
MIT License
1.78k stars 290 forks source link

Synaptic Intelligence implementation differs from the paper. #1226

Open yonatansverdlov opened 1 year ago

yonatansverdlov commented 1 year ago

Hi there. I understand that you took your implementation from "Continuous Learning in Single-Incremental-Task Scenarios". They argue that instead of using the Fisher Information matrix, they use an online version: F_k = (sum DL_k)/(Tk+eps)^2, where DL_k := (theta_new-theta_old) * theta. grad. My only concern is the fact that in contrast to the fisher information matrix DL_k is not necessarily positive, some entries there may be negative and the regularization factor may be negative. What do you think?

yonatansverdlov commented 1 year ago

In addition, in the current implementation, regularization is always applied, why there is no line like if clock.exp_counter =0 return 0.0 o.w. return SI loss(like in EWC)?

AntonioCarta commented 1 year ago

We can add the original SI implementation. I think right now the doc is quite clear that the implementation is not the original SI but we can surely expand on it.

why there is no line like if clock.exp_counter =0 return 0.0 o.w. return SI loss(like in EWC)?

This seems incorrect. @AndreaCossu what do you think?

AndreaCossu commented 1 year ago

We can add an explicit check, since it makes the code clearer. However, the current version should work (with some extra computation involved). During the first experience ewc_data[1] is not yet updated (it is updated at the end of each experience) and it is initialized with zero tensor. Therefore, the torch.dot operation in the compute_ewc_loss method will return a penalty of 0.