Closed jimmykimmy68 closed 9 months ago
Hmm, good question. I confess that I haven't looked at this for a long time, and even when I did, I didn't look at the maths too deeply.
I think that the phrase "log-likelihood" in the code is wrong, but the method implemented is correct. We're interested in how sensitive each output is to changes in each weight, but we don't care what the actual training gradient would be.
I'm happy to be corrected though!
Hi, thanks for the reply.
I think the log-likelihood would require both the ground-truth label and the predicted probabilities (i.e., model(data)) for its calculation.
I will look into this and your implementation in more detail.
Thank you!
For any other researchers who might be interested in this issue, the calculation of the true Fisher requires the ground-truth labels. However, for computation efficiency, most of the existing implementations (including Daniel's repository) use the empirical Fisher, which disposes of the use of ground-truth labels.
Refer to; Chaudhry, Arslan, Puneet K. Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. "Riemannian walk for incremental learning: Understanding forgetting and intransigence." In Proceedings of the European Conference on Computer Vision (ECCV), pp. 532-547. 2018.
Hi, thanks for sharing a great implementation of EWC!
I have a question on the fisher information function (i.e., def fisher_matrix(model, dataset, samples):), as follows.
In the calculation of the fisher information, the output = model(data) would provide the softmax outputs. But then according to your implementation, the log-likelihood is simply calculated as the log of the softmax output. Doesn't the likelihood calculation require the true label though? I thought the likelihood p(y|x,\theta) calculation would require the true label information (i.e., 'labels' in the implementation).
Thank you