Open JieDengsc opened 3 weeks ago
Hi @JieDengsc , Mammoth is not really designed to handle regression but depending on your use case it may be easy to adapt.
Since the task is regression I guess you probably want to define a domain-il task, since without labels you wouldn't know how to split into separate tasks. Taking as an example the "perm-mnist" dataset (in datasets/perm_mnist.py
) you could create a new file datasets/<your_dataset>.py
and in it define a class ContinualDataset
and defines:
NAME
: the name of your datasetSETTING
: this should just be set to domain-il
N_CLASSES_PER_TASK
: set it to 1, it will make your backbone output just one logitSIZE
: the size of your images, as a tupleget_data_loaders
: create and return the full datasets. These should return a tuple (sample, label, not_aug_sample)
for the train and (sample, label)
for the test. The not_aug_sample
is fundamental for methods that use a buffer (such as DER++). The get_data_loaders
function will be called once every task, so following the perm-mnist
you could handle different tasks here by defining a different transform per task. Make sure you call the store_masked_loaders
at the end. get_backbone
: return the backbone architecture that will be optimized for you taskget_loss
: return the F.mse_loss
instead of the CE found in other datasetsBesides this, you will need to modify
evaluate
function in utils/training.py
since now it only supports the accuracy.utils/training.py
to avoid casting the labels as long (this was done to prevent errors in windows).If you don't want a "domain-il" setting and want to split data according to some other policy, I still suggest to define a "domain-il" dataset and splitting the data in the get_data_loaders
.
We plan in the future to introduce some regression tasks. Let me know if yours is publicly available so that we may take a look into it.
Hi @loribonna , Thanks for your reply and suggestion, I will try it.
In addition, for ewc_on.py
, when calculating the fish matrix, why do you add exp_cond_prob
in the code?
fish += exp_cond_prob * self.net.get_grads() ** 2
According to the paper, only need to sum the squares of the gradients and take the average at the end.
Please let me know if I've misunderstood anything. Thanks a lot.
The question is a bit of a rabbit hole and I'm not an expert on this but the reason is because the Fisher information matrix is computed as the expectation over the model's prediction of the gradients squared, so you need to multiply them by p(y|x)
, which is the why we take the exp
of the loss.
I suggest you check out this paper and this discussion for more info.
Edit: in your regression scenario while you could use the same code from EwC I don't think the math would check out.
Specifically, I don't know how to modify the code. Because my task now is regression.