Reproducing BI+SI method

valeriya-khan commented 1 year ago

Hi! Firstly, I wanted to tell thank you for your great work. It was really interesting to read your paper. And you code is really thought-through. I wanted to ask how can I reproduce the results for brain-inspired combined with synaptic intelligence method? I tried to run the following command which gave me 18% accuracy: python main.py --experiment=CIFAR100 --scenario=class --brain-inspired --SI --seed=0 --pre-convE --freeze-convE --seed-to-ltag --time I also tried: python main.py --experiment=CIFAR100 --scenario=class --brain-inspired --SI --seed=0 --pre-convE --seed-to-ltag --time --reg-strength=100000000 (10^8) --dg-prop=0.6 as suggested in generative-classifiers paper if I understood correctly. However, even training accuracy dropped to 0 for context 2 during the training. Can you suggest me correct arguments to reproduce the results from "Brain-inspired replay for continual learning with artificial neural networks" for main.py script?

GMvandeVen commented 1 year ago

Hi, thanks for your interest in the code! To reproduce the results from BI-R + SI as reported in the papers "Brain-inspired replay for continual learning with artificial neural networks" and "Class-incremental learning with generative classifiers", it is best to use not this repository, but the repository accompanying the brain-inspired replay paper (https://github.com/GMvandeVen/brain-inspired-replay). In that repository, the command python main_cl.py --experiment=CIFAR100 --scenario=class --brain-inspired --replay=generative --si --dg-prop=0.6 --c=100000000. can be used to run the same BI-R + SI experiment on the class-incremental version of Split-CIFAR100 as reported in the above two papers.

In this repository (https://github.com/GMvandeVen/continual-learning) there are a few things implemented slightly differently compared to the brain-inspired replay paper. For example, the default in this repository is to run class-incremental learning experiments with an output layer in which always the output units of all classes are set to active, while in the brain-inspired replay paper an "expanding head" was used. (See for example the explanation under the header "BI-R" in the methods section (top of p.14) of this paper.) In the paper accompanying this repository I only tested BI-R by itself, and not combined with SI. I expect the reason that the second experiment you describe fails, is because using SI with a very high regularization strength might be problematic in a class-incremental learning experiment with always all units set to active (while it is OK with an expanding head).

Hope this helps!

valeriya-khan commented 1 year ago

Thank you very much for such a detailed answer :) Can I ask why you excluded BI-R + SI from new paper results table? It gave one of the best results if I am not mistaken, especially among generative methods. Also I still tried to reproduce BI-R+SI on new repository too with one feature extractor, given in the repository, and with argument --active-classes="all-so-far". But the results differ a lot. What can be the reason? Was something else changed not mentioned in your paper? Thank you very much for you time :)

GMvandeVen commented 1 year ago

Regarding the first part of your comment, that's a good question. In this new paper (although the preprint of this new paper is older than the paper on brain-inspired replay) I didn't include BI-R + SI in the comparison because the goal of the experiments in this paper is not to verify/champion a method as achieving state-of-the-art performance, but rather the goal is to compare the performance of different computational strategies for continual learning, and to do that on each of the three continual learning scenarios. To do this I tried to select for each strategy a few representative example methods. As the approach BI-R + SI combines two of those strategies, it wasn't suitable for this comparison. (But if you are interested in doing as well as possible on some continual learning problem, it might indeed often be best to combine multiple strategies.)

Regarding the second part of your comment, it seems you are right. Thank you for pointing this out. It indeed seems to be the case that also when using the argument --active-classes="all-so-far", the performance of BI-R + SI with the code in this repository is somewhat lower than the performance of BI-R + SI with the code in the repository of the brain-inspired replay paper. I will try to figure out what is causing this difference!

valeriya-khan commented 1 year ago

Thank you very much for the answer! Can I leave this issue open as a mean of communication? If you will find what causes the difference I would be happy to hear from you :)

GMvandeVen commented 1 year ago

Yes, please leave the issue open. I'm intending to get back on this when I figure it out!

GMvandeVen commented 1 year ago

Hi, I found one difference in the implementation of BI-R + SI between this repository and the repository of the BI-R paper, which seems to explain at least most of the difference in results you got. In the repository of the BI-R paper the method SI is only applied to the layers of the classifier (so not to the the layers of the decoder network), while in this repository the method SI is by default applied to all the layers of the network (so also to the layers of the decoder network). The lines in the repository of the BI-R paper where this is specified are here: https://github.com/GMvandeVen/brain-inspired-replay/blob/1a030f75666c656416e1ca02466758ca32cf2fe4/train.py#L296-L318 To mimic this behavior in this repository, you could replace the following line: https://github.com/GMvandeVen/continual-learning/blob/b4bd69a1b5c3c93eccb303591ee35fda0310aafa/models/cl/continual_learner.py#L19-L20 by self.param_list = [self.convE.named_parameters, self.fcE.named_parameters, self.classifier.named_parameters]. Note that the approach BI-R + SI can also work quite well when SI is applied to all the layers of the network, but this setting has different optimal hyper-parameter values (in particular the hyperparameter --dg-prop should be lower). Hope this helps!

valeriya-khan commented 1 year ago

Hi! Thank you very much for your help. Now, I was able to obtain 30% accuracy which is much better than before. If it is not difficult, can you tell what can be the reason of 2-5% difference between brain-inspired repository and this repository? Even with all-so-far option. Are there any other implementation differences? Thank you for your help :)

GMvandeVen commented 1 year ago

Hi, there are quite some other differences between the code in this repository and the other repository, but I haven’t been able to figure out which of those differences could cause a difference in performance when combining BI-R and SI. For example, one quite large difference is that, when a fixed feature extractor is used (i.e., the option --freeze-convE), in this repository all data are put through the feature extractor once at the beginning (which speeds up things considerably), while in the other repository the data are put through the feature extractor every time they are presented to the network. In principle I don’t think this difference should lead to a difference in performance, but perhaps for some reason it does.

If it is important to replicate the performance reported in the brain-inspired replay paper, my suggestion would be the use the original repository accompanying that paper (this one). Otherwise it should be fine to use this repository.

valeriya-khan commented 1 year ago

Thank you very much for your explanations. I want to replicate the results here, as I like that this repository includes other methods additionally to generative and regularization. Thank you very much for your help. I will close this issue, if you will remember something else, please reopen it, or write on my email: khan.lera@gmail.com. Have a nice day!

GMvandeVen / continual-learning

Reproducing BI+SI method #21