How to load .pth into the model

Westlake-AI / openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

https://openmixup.readthedocs.io

Apache License 2.0

624 stars 62 forks source link

How to load .pth into the model #45

Closed zeyuanyin closed 1 year ago

zeyuanyin commented 1 year ago

If I use bash tools/dist_train.sh configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py 1 to get the trained weight epoch_400.pth, how to load epoch_400.pth into the resnet18 model for the other task?

# https://github.com/Westlake-AI/openmixup/blob/main/configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py
model_r18_automix = dict(
    type='AutoMixup',
    pretrained=None,
    alpha=2.0,
    momentum=0.999,  # 0.999 to 0.999999
    mask_layer=2,
    mask_loss=0.1,  # using mask loss
    mask_adjust=0,  # prob of adjusting bb mask in terms of lam by mixup, 0.25 for CIFAR
    lam_margin=0.08,  # degenerate to mixup when lam or 1-lam <= 0.08
    mask_up_override=None,  # If not none, override upsampling when train MixBlock
    debug=True,  # show attention and content map
    backbone=dict(
        type='ResNet_CIFAR',  # CIFAR version
        depth=18,
        num_stages=4,
        out_indices=(2,3),  # stage-3 for MixBlock, x-1: stage-x
        style='pytorch'),
    mix_block = dict(  # AutoMix
        type='PixelMixBlock',
        in_channels=256, reduction=2, use_scale=True,
        unsampling_mode=['nearest',],  # str or list
        lam_concat=False, lam_concat_v=False,  # AutoMix: no lam cat for small-scale datasets
        lam_mul=False, lam_residual=False, lam_mul_k=-1,  # SAMix lam: none
        value_neck_cfg=None,  # SAMix: non-linear value
        x_qk_concat=False, x_v_concat=False,  # SAMix x concat: none
        # att_norm_cfg=dict(type='BN'),  # norm after q,k (design for fp16, also conduct better performace in fp32)
        mask_loss_mode="L1", mask_loss_margin=0.1,  # L1 loss, 0.1
        frozen=False),
    head_one=dict(
        type='ClsHead',  # default CE
        loss=dict(type='CrossEntropyLoss', use_soft=False, use_sigmoid=False, loss_weight=1.0),
        with_avg_pool=True, multi_label=False, in_channels=512, num_classes=200),
    head_mix=dict(  # backbone & mixblock
        type='ClsMixupHead',  # mixup, default CE
        loss=dict(type='CrossEntropyLoss', use_soft=False, use_sigmoid=False, loss_weight=1.0),
        with_avg_pool=True, multi_label=False, in_channels=512, num_classes=200),
    head_weights=dict(
        head_mix_q=1, head_one_q=1, head_mix_k=1, head_one_k=1),
)

if __name__ == '__main__':
    # build the model and load checkpoint
    from openmixup.models import build_model
    from mmcv.runner import load_checkpoint
    cfg_model = model_r18_automix
    checkpoint = './openmixup/work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth'
    model = build_model(cfg_model)
    load_checkpoint(model, checkpoint, map_location='cpu')

    print(model)

Is this code correct? Do you have a more concise way to load the model?

zeyuanyin commented 1 year ago

There is a concise way to load the model

  from mmcv import Config
  from mmcv.runner import load_checkpoint
  from openmixup.models import build_model

  cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py')
  model = build_model(cfg.model)
  load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')

Lupin1998 commented 1 year ago

There is a concise way to load the model

  from mmcv import Config
  from mmcv.runner import load_checkpoint
  from openmixup.models import build_model

  cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py')
  model = build_model(cfg.model)
  load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')

Hi @zeyuanyin, thanks for your question and the concise answer. The way you mentioned is also the official way we loading a full checkpoint as in tools/train.py. Feel free to ask me if you have more questions.

zeyuanyin commented 1 year ago

Thanks a lot for your confirmation @Lupin1998

I have other questions about the output of the pretrained model.

  print(model)

  model.cuda()
  import torch
  input = torch.randn(4, 3, 64, 64).cuda()
  input.requires_grad = True
  dummy_label = torch.tensor([1, 2, 3, 4]).cuda()

  print('----train')
  model.train()
  output = model(input, gt_label=dummy_label, return_loss=True)
  print(output)

  print('----eval')
  model.eval()
  output = model(input, gt_label=dummy_label, return_loss=True)
  print(output)

Output is

...
  (head_mix_q): ClsMixupHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_mix_k): ClsMixupHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_one_q): ClsHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_one_k): ClsHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
)
----train
{'loss': tensor(16.9432, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(27.7222, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}

I want to know which strategy is used to calculate the loss value because the loss output is different in train and eval modes.
Is there any way I can modify the output of the model to let it output the logits directly?
What is the meaning of the other return values, acc_mix_q, acc_one_q,acc_mix_k?

Lupin1998 commented 1 year ago

Thanks for your detailed questions and I answer them as follows:

As for the classification loss, you should use model.train() during training while using model.eval() for testing. The main difference between train() and eval() is the behaviours of BN and Dropout (e.g., refer to https://zhuanlan.zhihu.com/p/357075502 for detailed explanations).
You can choose the mode of the classification model with mode=xxx (refer to base_model.py), e.g., get the logits directly with mode="inference".
```
output = model(input, gt_label=dummy_label, mode="inference")  # the logit of the first head
```
Meanwhile, the basic method of AutoMix variants (using AutoMixup class) is different from other Mixup methods (using MixupClassification class). forward_inference returns the logits of the first classification head while AutoMixup should output the logits of four classification heads. We report the better results among them (usually acc_mix_k) for AutoMix experiments using find_automix_val_median. You might rewrite the forward_inference for AutoMixup to get the logits of four heads.
As for a normal classifier using MixupClassification class, the testing results (mode="test") will be "head0_top1": xxx, where head0 denotes the first classification head and topX denotes the top-X accuracy. As for the AutoMix classifier using AutoMixup class, the output will be acc_mix_q_topX, acc_mix_k_topX, acc_one_q_topX, and acc_one_k_topX, which means the top-X accuracy of the certain head (mix denotes the mixup head and one denotes the one-hot head). Please refer to the paper of AutoMix for details.

I hope these will be helpful to you. 😄

zeyuanyin commented 1 year ago

Thanks for your very detailed reply. It seems not clear about my first question statement, we use CrossEntropy to calculate the loss, but I want to know whether the input will apply mixup in training mode and not in the inference mode and whether the used target is one-hot or mixed in different modes.

Test code is

print('----args inference model')
logits = model(input, gt_label=dummy_label, mode="inference")
assert logits.shape == (4, 200)
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, dummy_label)
print(loss)

print('----train')
model.train()
output = model(input, gt_label=dummy_label)
print(output)

print('----eval')
model.eval()
output = model(input, gt_label=dummy_label)
print(output)

The output is:

----args inference model
tensor(5.2924, device='cuda:0', grad_fn=<NllLossBackward0>)
----train
{'loss': tensor(16.6683, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(26.5577, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}

In my test result, the first loss calculated by logits and CrossEntropyLoss does not match the loss in model.eval model. There exists the difference among the loss calculation strategies.

Lupin1998 commented 1 year ago

As for the further question, mixup loss is applied to mode=train in the classification head (see ClsMixupHead class), and testing and inference modes will perform sample mixup. Please refer to the implementation of forward_train in AutoMixup for more details.

Lupin1998 commented 1 year ago

Hi, @zeyuanyin. I guess the above answers can be helpful to your questions. If you have further questions about mixup augmentations and AutoMix implementations, maybe we can discuss them through WeChat (finding me by Lupin_1998). I close this issue and you can open a new one for other questions.