convit object has no attribute 'module'

arthurdouillard / dytox

Dynamic Token Expansion with Continual Transformers, accepted at CVPR 2022

Apache License 2.0

134 stars 17 forks source link

Hi,

I'm running your code for CIFAR-100 with Convit backbone (as suggested in the Readme file. I'm running into this error when the rehearsal memory is being updated.

/volumes1/Home/anaconda3/envs/timm-env/bin/python /volumes2/Other/dytox/main.py --options options/data/cifar100_10-10.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml --name dytox --data-path logs/ --output-basedir outputs/ --patch-size 4 --epochs 2 --base-epochs 2
Not using distributed mode
Namespace(aa='rand-m9-mstd0.5-inc1', auto_kd=True, base_epochs=2, batch_size=128, bce_loss=True, class_attention=True, class_order=[87, 0, 52, 58, 44, 91, 68, 97, 51, 15, 94, 92, 10, 72, 49, 78, 61, 14, 8, 86, 84, 96, 18, 24, 32, 45, 88, 11, 4, 67, 69, 66, 77, 47, 79, 93, 29, 50, 57, 83, 17, 81, 41, 12, 37, 59, 25, 20, 80, 73, 1, 28, 6, 46, 62, 82, 53, 9, 31, 75, 38, 63, 33, 74, 27, 22, 36, 3, 16, 21, 60, 19, 70, 90, 89, 43, 5, 42, 65, 76, 40, 30, 23, 85, 2, 95, 56, 48, 71, 64, 98, 13, 99, 7, 34, 55, 54, 26, 35, 39], clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=0.0, cutmix_minmax=None, data_path='logs/', data_set='CIFAR', debug=False, decay_epochs=30, decay_rate=0.1, depth=6, device='cuda', dist_eval=False, dist_url='env://', distillation_tau=1.0, distributed=False, drop=0.0, drop_path=0.1, dytox=True, embed_dim=384, epochs=2, eval=False, eval_every=50, finetuning='balanced', finetuning_epochs=20, finetuning_lr=5e-05, finetuning_resetclf=False, finetuning_teacher=False, fixed_memory=False, freeze_eval=False, freeze_ft=['sab'], freeze_task=['old_task_tokens', 'old_heads'], head_div=0.1, head_div_mode='tr', inat_category='name', increment=10, incremental_batch_size=128, incremental_lr=0.0005, incremental_warmup_lr=None, ind_clf='1-1', initial_increment=10, input_size=32, joint_tokens=False, local_rank=None, local_up_to_layer=5, locality_strength=1.0, log_category='10-10', log_dir='logs/cifar/10-10/22-03/week-4/25_dytox', log_path='logs', look_sam_alpha=0.7, look_sam_k=0, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, max_task=None, memory_size=2000, min_lr=1e-05, mixup=0.0, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convit', momentum=0.9, name='dytox', no_amp=True, norm='layer', num_heads=12, num_workers=0, only_ft=False, opt='adamw', opt_betas=None, opt_eps=1e-08, options=['options/data/cifar100_10-10.yaml', 'options/data/cifar100_order1.yaml', 'options/model/cifar_dytox.yaml'], output_basedir='outputs/', output_dir='', patch_size=4, patience_epochs=10, pin_mem=True, recount=1, rehearsal='icarl_all', remode='pixel', repeated_aug=True, replay_memory=0, reprob=0.0, resplit=False, resume='', sam_adaptive=False, sam_div='', sam_final=None, sam_first='main', sam_mode=['tr', 'ft'], sam_rho=0.0, sam_second='main', sam_skip_first=False, save_every_epoch=None, sched='cosine', seed=0, sep_memory=False, smoothing=0.1, start_epoch=0, start_task=0, train_interpolation='bicubic', trial_id=1, validation=0.0, warmup_epochs=5, warmup_lr=1e-06, weight_decay=1e-06, world_size=1)
Files already downloaded and verified
Files already downloaded and verified
Creating model: convit
kdytox\
number of params: 10689334
Starting task id 0/9
Creating DyTox!
Adding new parameters
Start training for 2 epochs
Image size is torch.Size([128, 3, 32, 32]).
Task: [0] Epoch: [0]  [ 0/39]  eta: 0:00:20  lr: 0.000001  loss: 0.6984 (0.6984)  time: 0.5359  data: 0.0451  max mem: 1854
Task: [0] Epoch: [0]  [10/39]  eta: 0:00:04  lr: 0.000001  loss: 0.6794 (0.6781)  time: 0.1599  data: 0.0444  max mem: 1982
Task: [0] Epoch: [0]  [20/39]  eta: 0:00:02  lr: 0.000001  loss: 0.6494 (0.6559)  time: 0.1219  data: 0.0436  max mem: 1982
Task: [0] Epoch: [0]  [30/39]  eta: 0:00:01  lr: 0.000001  loss: 0.6122 (0.6364)  time: 0.1215  data: 0.0430  max mem: 1982
Task: [0] Epoch: [0]  [38/39]  eta: 0:00:00  lr: 0.000001  loss: 0.5855 (0.6199)  time: 0.1216  data: 0.0432  max mem: 1982
Task: [0] Epoch: [0] Total time: 0:00:05 (0.1325 s / it)
Averaged stats: lr: 0.000001  loss: 0.5855 (0.6199)
Test:  [0/6]  eta: 0:00:00  loss: 2.2756 (2.2756)  acc1: 13.5417 (13.5417)  acc5: 59.8958 (59.8958)  time: 0.1295  data: 0.0597  max mem: 1982
Test:  [5/6]  eta: 0:00:00  loss: 2.2756 (2.2783)  acc1: 15.1042 (15.5000)  acc5: 59.3750 (58.5000)  time: 0.0584  data: 0.0277  max mem: 1982
Test: Total time: 0:00:00 (0.0585 s / it)
* Acc@1 15.500  loss 2.278
Accuracy of the network on the 1000 test images: 15.5%
Max accuracy: 15.50%
Image size is torch.Size([128, 3, 32, 32]).
Task: [0] Epoch: [1]  [ 0/39]  eta: 0:00:04  lr: 0.000001  loss: 0.5338 (0.5338)  time: 0.1238  data: 0.0449  max mem: 1982
Task: [0] Epoch: [1]  [10/39]  eta: 0:00:03  lr: 0.000001  loss: 0.5266 (0.5255)  time: 0.1227  data: 0.0442  max mem: 1982
Task: [0] Epoch: [1]  [20/39]  eta: 0:00:02  lr: 0.000001  loss: 0.5165 (0.5164)  time: 0.1216  data: 0.0432  max mem: 1982
Task: [0] Epoch: [1]  [30/39]  eta: 0:00:01  lr: 0.000001  loss: 0.4914 (0.5060)  time: 0.1222  data: 0.0434  max mem: 1982
Task: [0] Epoch: [1]  [38/39]  eta: 0:00:00  lr: 0.000001  loss: 0.4786 (0.4984)  time: 0.1225  data: 0.0436  max mem: 1982
Task: [0] Epoch: [1] Total time: 0:00:04 (0.1222 s / it)
Averaged stats: lr: 0.000001  loss: 0.4786 (0.4984)
Test:  [0/6]  eta: 0:00:00  loss: 2.2362 (2.2362)  acc1: 18.2292 (18.2292)  acc5: 65.6250 (65.6250)  time: 0.0690  data: 0.0447  max mem: 1982
Test:  [5/6]  eta: 0:00:00  loss: 2.2275 (2.2374)  acc1: 18.2292 (18.1000)  acc5: 62.5000 (62.7000)  time: 0.0479  data: 0.0259  max mem: 1982
Test: Total time: 0:00:00 (0.0480 s / it)
* Acc@1 18.100  loss 2.237
Accuracy of the network on the 1000 test images: 18.1%
Max accuracy: 18.10%
Traceback (most recent call last):
  File "/volumes2/Other/dytox/main.py", line 733, in <module>
    main(args)
  File "/volumes2/Other/dytox/main.py", line 590, in main
    memory.add(scenario_train[task_id], model, args.initial_increment if task_id == 0 else args.increment)
  File "/volumes2/Other/dytox/continual/rehearsal.py", line 68, in add
    x, y, t = herd_samples(dataset, model, self.memory_per_class, self.rehearsal)
  File "/volumes2/Other/dytox/continual/rehearsal.py", line 146, in herd_samples
    features, targets = extract_features(dataset, model, handling)
  File "/volumes2/Other/dytox/continual/rehearsal.py", line 181, in extract_features
    feats, _, _ = model.module.forward_features(x.cuda())
  File "/volumes1/Home/anaconda3/envs/timm-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'ConVit' object has no attribute 'module'

Process finished with exit code 1

I'm running for 2 epochs per task (to understand the structure of the code), and these are the arguments I'm using

--options options/data/cifar100_10-10.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml --name dytox --data-path logs/ --output-basedir outputs/ --patch-size 4 --epochs 2 --base-epochs 2

I also noticed that the training and validation loops always use the Classification head inside ConVit and does not use the ContinualClassifier inside the dytox.py. Is that expected?

After first task, ConVit's Classifier weights are changed (when compared to the initialized weights), but Dytox module's ContinualClassifier still has the same weights and these unchanged weights are freezed before the second task. I was expecting the ConVit's weights to be copied to ContinualClassifier after every task. In short, I would like to know how you save the updated classifier weights of previous task before moving to the next task.

Any clarification regarding this would be much helpful! Thank you!

/volumes1/Home/anaconda3/envs/timm-env/bin/python /volumes2/Other/dytox/main.py --options options/data/cifar100_10-10.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml --name dytox --data-path logs/ --output-basedir outputs/ --patch-size 4 --epochs 2 --base-epochs 2 Not using distributed mode Namespace(aa='rand-m9-mstd0.5-inc1', auto_kd=True, base_epochs=2, batch_size=128, bce_loss=True, class_attention=True, class_order=[87, 0, 52, 58, 44, 91, 68, 97, 51, 15, 94, 92, 10, 72, 49, 78, 61, 14, 8, 86, 84, 96, 18, 24, 32, 45, 88, 11, 4, 67, 69, 66, 77, 47, 79, 93, 29, 50, 57, 83, 17, 81, 41, 12, 37, 59, 25, 20, 80, 73, 1, 28, 6, 46, 62, 82, 53, 9, 31, 75, 38, 63, 33, 74, 27, 22, 36, 3, 16, 21, 60, 19, 70, 90, 89, 43, 5, 42, 65, 76, 40, 30, 23, 85, 2, 95, 56, 48, 71, 64, 98, 13, 99, 7, 34, 55, 54, 26, 35, 39], clip_grad=None, color_jitter=0.4, cooldown_epochs=10, cutmix=0.0, cutmix_minmax=None, data_path='logs/', data_set='CIFAR', debug=False, decay_epochs=30, decay_rate=0.1, depth=6, device='cuda', dist_eval=False, dist_url='env://', distillation_tau=1.0, distributed=False, drop=0.0, drop_path=0.1, dytox=True, embed_dim=384, epochs=2, eval=False, eval_every=50, finetuning='balanced', finetuning_epochs=20, finetuning_lr=5e-05, finetuning_resetclf=False, finetuning_teacher=False, fixed_memory=False, freeze_eval=False, freeze_ft=['sab'], freeze_task=['old_task_tokens', 'old_heads'], head_div=0.1, head_div_mode='tr', inat_category='name', increment=10, incremental_batch_size=128, incremental_lr=0.0005, incremental_warmup_lr=None, ind_clf='1-1', initial_increment=10, input_size=32, joint_tokens=False, local_rank=None, local_up_to_layer=5, locality_strength=1.0, log_category='10-10', log_dir='logs/cifar/10-10/22-03/week-4/25_dytox', log_path='logs', look_sam_alpha=0.7, look_sam_k=0, lr=0.0005, lr_noise=None, lr_noise_pct=0.67, lr_noise_std=1.0, max_task=None, memory_size=2000, min_lr=1e-05, mixup=0.0, mixup_mode='batch', mixup_prob=1.0, mixup_switch_prob=0.5, model='convit', momentum=0.9, name='dytox', no_amp=True, norm='layer', num_heads=12, num_workers=0, only_ft=False, opt='adamw', opt_betas=None, opt_eps=1e-08, options=['options/data/cifar100_10-10.yaml', 'options/data/cifar100_order1.yaml', 'options/model/cifar_dytox.yaml'], output_basedir='outputs/', output_dir='', patch_size=4, patience_epochs=10, pin_mem=True, recount=1, rehearsal='icarl_all', remode='pixel', repeated_aug=True, replay_memory=0, reprob=0.0, resplit=False, resume='', sam_adaptive=False, sam_div='', sam_final=None, sam_first='main', sam_mode=['tr', 'ft'], sam_rho=0.0, sam_second='main', sam_skip_first=False, save_every_epoch=None, sched='cosine', seed=0, sep_memory=False, smoothing=0.1, start_epoch=0, start_task=0, train_interpolation='bicubic', trial_id=1, validation=0.0, warmup_epochs=5, warmup_lr=1e-06, weight_decay=1e-06, world_size=1) Files already downloaded and verified Files already downloaded and verified Creating model: convit kdytox\ number of params: 10689334 Starting task id 0/9 Creating DyTox! Adding new parameters Start training for 2 epochs Image size is torch.Size([128, 3, 32, 32]). Task: [0] Epoch: [0] [ 0/39] eta: 0:00:20 lr: 0.000001 loss: 0.6905 (0.6905) time: 0.5381 data: 0.0460 max mem: 1854 Task: [0] Epoch: [0] [10/39] eta: 0:00:04 lr: 0.000001 loss: 0.6784 (0.6763) time: 0.1629 data: 0.0450 max mem: 1982 Task: [0] Epoch: [0] [20/39] eta: 0:00:02 lr: 0.000001 loss: 0.6524 (0.6549) time: 0.1247 data: 0.0444 max mem: 1982 Task: [0] Epoch: [0] [30/39] eta: 0:00:01 lr: 0.000001 loss: 0.6127 (0.6364) time: 0.1240 data: 0.0442 max mem: 1982 Task: [0] Epoch: [0] [38/39] eta: 0:00:00 lr: 0.000001 loss: 0.5865 (0.6195) time: 0.1251 data: 0.0455 max mem: 1982 Task: [0] Epoch: [0] Total time: 0:00:05 (0.1355 s / it) Averaged stats: lr: 0.000001 loss: 0.5865 (0.6195) Test: [0/6] eta: 0:00:00 loss: 2.2770 (2.2770) acc1: 14.0625 (14.0625) acc5: 59.3750 (59.3750) time: 0.1299 data: 0.0595 max mem: 1982 Test: [5/6] eta: 0:00:00 loss: 2.2770 (2.2788) acc1: 15.6250 (15.9000) acc5: 59.3750 (58.5000) time: 0.0570 data: 0.0255 max mem: 1982 Test: Total time: 0:00:00 (0.0571 s / it) * Acc@1 15.900 loss 2.279 Accuracy of the network on the 1000 test images: 15.9% Max accuracy: 15.90% Image size is torch.Size([128, 3, 32, 32]). Task: [0] Epoch: [1] [ 0/39] eta: 0:00:04 lr: 0.000001 loss: 0.5388 (0.5388) time: 0.1256 data: 0.0424 max mem: 1982 Task: [0] Epoch: [1] [10/39] eta: 0:00:03 lr: 0.000001 loss: 0.5310 (0.5286) time: 0.1236 data: 0.0427 max mem: 1982 Task: [0] Epoch: [1] [20/39] eta: 0:00:02 lr: 0.000001 loss: 0.5139 (0.5189) time: 0.1225 data: 0.0424 max mem: 1982 Task: [0] Epoch: [1] [30/39] eta: 0:00:01 lr: 0.000001 loss: 0.4955 (0.5078) time: 0.1230 data: 0.0434 max mem: 1982 Task: [0] Epoch: [1] [38/39] eta: 0:00:00 lr: 0.000001 loss: 0.4796 (0.5000) time: 0.1236 data: 0.0438 max mem: 1982 Task: [0] Epoch: [1] Total time: 0:00:04 (0.1233 s / it) Averaged stats: lr: 0.000001 loss: 0.4796 (0.5000) Test: [0/6] eta: 0:00:00 loss: 2.2385 (2.2385) acc1: 18.2292 (18.2292) acc5: 65.6250 (65.6250) time: 0.0507 data: 0.0262 max mem: 1982 Test: [5/6] eta: 0:00:00 loss: 2.2265 (2.2373) acc1: 18.2292 (18.1000) acc5: 62.5000 (63.2000) time: 0.0432 data: 0.0218 max mem: 1982 Test: Total time: 0:00:00 (0.0433 s / it) * Acc@1 18.100 loss 2.237 Accuracy of the network on the 1000 test images: 18.1% Max accuracy: 18.10% Test: [0/6] eta: 0:00:00 loss: 2.2385 (2.2385) acc1: 18.2292 (18.2292) acc5: 65.6250 (65.6250) time: 0.0794 data: 0.0548 max mem: 1982 Test: [5/6] eta: 0:00:00 loss: 2.2265 (2.2373) acc1: 18.2292 (18.1000) acc5: 62.5000 (63.2000) time: 0.0465 data: 0.0249 max mem: 1982 Test: Total time: 0:00:00 (0.0466 s / it) * Acc@1 18.100 loss 2.237 Accuracy of the network on the 1000 test images: 18.1% Max accuracy: 18.10% Starting task id 1/9 2000 samples added from memory. Updating ensemble, new embed dim 384. Adding new parameters Start training for 2 epochs Image size is torch.Size([128, 3, 32, 32]). Traceback (most recent call last): File "/volumes2/Other/dytox/main.py", line 733, in <module> main(args) File "/volumes2/Other/dytox/main.py", line 540, in main train_stats = train_one_epoch( File "/volumes2/Other/dytox/continual/engine.py", line 60, in train_one_epoch loss_tuple = forward(samples, targets, model, teacher_model, criterion, lam, args) File "/volumes2/Other/dytox/continual/engine.py", line 162, in forward loss = criterion(main_output, targets) # bce_with_logits File "/volumes2/Other/dytox/continual/losses.py", line 71, in bce_with_logits torch.eye(x.shape[1])[y].to(y.device) IndexError: index 12 is out of bounds for dimension 0 with size 10 Process finished with exit code 1

arthurdouillard / dytox

convit object has no attribute 'module' #1