GMvandeVen / continual-learning

PyTorch implementation of various methods for continual learning (XdG, EWC, SI, LwF, FROMP, DGR, BI-R, ER, A-GEM, iCaRL, Generative Classifier) in three different scenarios.
MIT License
1.54k stars 310 forks source link

Performance #10

Closed Johswald closed 4 years ago

Johswald commented 4 years ago

hey again!

when I execute ./main.py --ewc --online --lambda=5000 --gamma=1 --scenario task

this should be close to 99% acc no?

For EWC and SI I get much worse performance with the default values. What am I doing wrong? Thank you!

GMvandeVen commented 4 years ago

It's actually indeed the case that for those hyperparameter-values the performance of Online EWC on the split MNIST task protocol is rather bad. This issue has confused me for quite a while as well. It turns out that on the split MNIST protocol, the by the developers recommended (or default) hyperparameter-values of SI and especially of EWC don't work very well. For EWC and Online EWC, lambda even needs to be set several orders of magnitude larger. See also Appendix D and the footnote on page 7 of our paper: https://arxiv.org/pdf/1904.07734.pdf.

Johswald commented 4 years ago

Thanks for this prompt response - ok I thought that you set the default values to the ones to get your reported accs. Would it be possible to share the calls? Its a bit hard to get the best values out of your hyperparameter search plots. Thanks again for this repo - its really helpfull!

GMvandeVen commented 4 years ago

Ah yes, sorry. It would indeed have been good to at least report those hyper parameter values somewhere. Here are all the calls with the values we selected:

For split MNIST:

./main.py --scenario=task --xdg=0.95
./main.py --scenario=task --ewc --lambda=10000000
./main.py --scenario=task --ewc --online --lambda=100000000 --gamma=0.8
./main.py --scenario=task --si --c=50

./main.py --scenario=domain --ewc --lambda=1000000
./main.py --scenario=domain --ewc --online --lambda=100000000 --gamma=0.7
./main.py --scenario=domain --si --c=500

./main.py --scenario=class --ewc --lambda=100000000
./main.py --scenario=class --ewc --online --lambda=1000000000 --gamma=0.8
./main.py --scenario=class --si --c=0.5

For permuted MNIST:

./main.py --experiment=permMNIST --tasks=10 --scenario=task --xdg=0.55
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --online --lambda=500 --gamma=0.8
./main.py --experiment=permMNIST --tasks=10 --scenario=task --si --c=5

./main.py --experiment=permMNIST --tasks=10 --scenario=domain --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=domain --ewc --online --lambda=1000 --gamma=0.9
./main.py --experiment=permMNIST --tasks=10 --scenario=domain --si --c=5

./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --lambda=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --online --lambda=5 --gamma=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --si --c=0.1
Johswald commented 4 years ago

Thank you again!

chongyi-zheng commented 3 years ago

Hello, there @GMvandeVen. I am trying to run EWC and SI experiments with your hyperparameters, but when I use the following commands the average precisions are poor.

./main.py --scenario=class --ewc --lambda=100000000
./main.py --scenario=class --si --c=0.5
./main.py --experiment=permMNIST --tasks=10 --scenario=class --ewc --lambda=1
./main.py --experiment=permMNIST --tasks=10 --scenario=class --si --c=0.1

However, the commands for the task scenario works well.

./main.py --scenario=task --ewc --lambda=10000000
./main.py --scenario=task --si --c=50
./main.py --experiment=permMNIST --tasks=10 --scenario=task --ewc --lambda=500
./main.py --experiment=permMNIST --tasks=10 --scenario=task --si --c=5

Any suggestion?

GMvandeVen commented 3 years ago

Hi @YeeCY, thanks for your interest in my code. The observation you describe is correct, the methods EWC and SI actually do not work well with class-incremental learning (--scenario=class), even with their best hyper parameters; while these methods do work reasonably well with task-incremental learning (--scenario=task). See for example this paper (https://arxiv.org/abs/1904.07734) for more details on the difference between these scenarios. Hope this helps!

chongyi-zheng commented 3 years ago

Hi @YeeCY, thanks for your interest in my code. The observation you describe is correct, the methods EWC and SI actually do not work well with class-incremental learning (--scenario=class), even with their best hyper parameters; while these methods do work reasonably well with task-incremental learning (--scenario=task). See for example this paper (https://arxiv.org/abs/1904.07734) for more details on the difference between these scenarios. Hope this helps!

Ok, that's a good summary. And I will try to run with task-incremental learning. By the way, would you mind providing best hyperparameters for other algorithms like AGEM?