Closed MiaoYongbiao closed 5 years ago
In models/resnet32.py, I just use a 100-unit FC layer instead of 10 units per incremental step. So I select the corresponding units in distillation loss. I think it's the same to the concat layer in your _forkresnet.m
@fmcp In _datahandler/dataset.py, I implement the same data augmentation in the CIFAR-100 function.
And another question is what's the base model training setting(before incremental training), I often get 85~90% base model accuracy, so I think the overfitting base model leads to poor incremental performance. Through gradient noise and L2 regularization, I control the base model accuracy to 80%. But I just get 74% accuracy in first 10-task incremental steps, which is far less than the accuracy bound in the paper.
Sorry but I don't use Pytorch so I only can help you if you use the code published in this repo.
Thanks for your interesting work in this paper! In the last two weeks, I try to implement a pytorch version of this work. Referring to the iCaRL repositories and your given source code, I implement Cross-distilled loss function, data augmentation and gradient noise. But something is wrong, I haven't got the accuracy of CIFAR-100 in 10 incremental steps. I upload my version in my github. The main modified file in _trainer/trainer.py and the necessary remarks have been added. Please give me some advice in your rest time. Thanks!