arthurdouillard / incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).
MIT License
388 stars 60 forks source link

Could share the config file for your UCIR for imagenet 100. #41

Closed RamMohan112 closed 3 years ago

RamMohan112 commented 3 years ago

This is what I am using right now:

dataset: imagenet100

model: ucir convnet: rebuffi convnet_config: last_relu: false

batch_size: 64 memory_size: 2000 fixed_memory: True

classifier_config: scaling: 1 gamma: 1 type: cosine proxy_per_class: 1 distance: neg_stable_cosine_distance

less_forget: scheduled_factor: true lambda: 10

postprocessor_config: initial_value: 1.0 type: learned_scaling

ranking_loss: factor: 1.0 nb_negatives: 2 margin: 0.5

finetuning_config: tuning: classifier lr: 0.01 epochs: 20

lr: 0.1 weight_decay: 0.0001 scheduling: type: step epochs: [30, 60] gamma: 0.1 lr_decay: 0.1 optimizer: sgd epochs: 90

weight_generation: type: imprinted

But I am getting a difference of around 8% in the base task(accuracy after training on the base task) that is joint training while doing increments of 20-20-20-20-20, So in the first 20. Am I doing something incorrect? Also I noticed an increase in the don't base task performance when compared to UCIR paper's implementation by 3%. I first though that it might be due number of proxy per class used in podnet but even when I reduced it to 1 in your config file, it displayed a similar 3% increase, any ideas why it might be performing better in joint training of base task.

arthurdouillard commented 3 years ago

Hey,

Have you look at my recent commit that fixed a bug in UCIR? https://github.com/arthurdouillard/incremental_learning.pytorch/commit/fb997b869d4b39176a8df5f6a441533c092c8db4

The hyperparameter seems ok, although I would advise you to look in the papers to double-check. Note that the finetuning phase isn't mentionned in UCIR paper in their official code so you need to look into that also (not funny I know...).

I think for the classifier distance, they would simply use "cosine".

For the convnet, you want to use a ResNet18, with nf=64.

Hope it can help! Please note that UCIR results in my paper were reproduced using their official code base, not my code.