Open yonatansverdlov opened 1 year ago
In addition, in the current implementation, regularization is always applied, why there is no line like if clock.exp_counter =0 return 0.0 o.w. return SI loss(like in EWC)?
We can add the original SI implementation. I think right now the doc is quite clear that the implementation is not the original SI but we can surely expand on it.
why there is no line like if clock.exp_counter =0 return 0.0 o.w. return SI loss(like in EWC)?
This seems incorrect. @AndreaCossu what do you think?
We can add an explicit check, since it makes the code clearer. However, the current version should work (with some extra computation involved). During the first experience ewc_data[1]
is not yet updated (it is updated at the end of each experience) and it is initialized with zero tensor. Therefore, the torch.dot
operation in the compute_ewc_loss
method will return a penalty of 0.
Hi there. I understand that you took your implementation from "Continuous Learning in Single-Incremental-Task Scenarios". They argue that instead of using the Fisher Information matrix, they use an online version: F_k = (sum DL_k)/(Tk+eps)^2, where DL_k := (theta_new-theta_old) * theta. grad. My only concern is the fact that in contrast to the fisher information matrix DL_k is not necessarily positive, some entries there may be negative and the regularization factor may be negative. What do you think?