Open jmin0530 opened 2 years ago
See https://github.com/arthurdouillard/dytox/blob/main/erratum_distributed.md
You probably want to use global memory and 2k memory.
If you use distributed memory with 1k, your effective memory size is rather low (much lower than 2k).
Hello, Thank you for your code. I used the setting of dytox for 50 steps, but I got a different results from your paper.
I ran cli command below
According to your paper, your result on CIFAR-100 50 steps is "Avg acc: 64.82, Last acc: 45.61" Here is the three CIFAR-100 orders reproduction result:
Also I will show dytox setting to you
DyTox, for CIFAR100
Model definition
model: convit embed_dim: 384 depth: 6 num_heads: 12 patch_size: 4 input_size: 32 local_up_to_layer: 5 class_attention: true
Training setting
no_amp: true eval_every: 50
Base hyperparameter
weight_decay: 0.000001 batch_size: 128 incremental_batch_size: 256 incremental_lr: 0.0005 rehearsal: icarl_all
Knowledge Distillation
auto_kd: true
Finetuning
finetuning: balanced finetuning_epochs: 20 ft_no_sampling: true
Dytox model
dytox: true freeze_task: [old_task_tokens, old_heads] freeze_ft: [sab]
Divergence head to get diversity
head_div: 0.1 head_div_mode: tr
Independent Classifiers
ind_clf: 1-1 bce_loss: true
Advanced Augmentations, here disabled
Erasing
reprob: 0.0 remode: pixel recount: 1 resplit: false
MixUp & CutMix
mixup: 0.0 cutmix: 0.0
I can't understand why my reproduction results differ from the results you wrote in your paper. Thank you.