Westlake-AI / openmixup

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark
https://openmixup.readthedocs.io
Apache License 2.0
624 stars 62 forks source link

How to load .pth into the model #45

Closed zeyuanyin closed 1 year ago

zeyuanyin commented 1 year ago

If I use bash tools/dist_train.sh configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py 1 to get the trained weight epoch_400.pth, how to load epoch_400.pth into the resnet18 model for the other task?

# https://github.com/Westlake-AI/openmixup/blob/main/configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py
model_r18_automix = dict(
    type='AutoMixup',
    pretrained=None,
    alpha=2.0,
    momentum=0.999,  # 0.999 to 0.999999
    mask_layer=2,
    mask_loss=0.1,  # using mask loss
    mask_adjust=0,  # prob of adjusting bb mask in terms of lam by mixup, 0.25 for CIFAR
    lam_margin=0.08,  # degenerate to mixup when lam or 1-lam <= 0.08
    mask_up_override=None,  # If not none, override upsampling when train MixBlock
    debug=True,  # show attention and content map
    backbone=dict(
        type='ResNet_CIFAR',  # CIFAR version
        depth=18,
        num_stages=4,
        out_indices=(2,3),  # stage-3 for MixBlock, x-1: stage-x
        style='pytorch'),
    mix_block = dict(  # AutoMix
        type='PixelMixBlock',
        in_channels=256, reduction=2, use_scale=True,
        unsampling_mode=['nearest',],  # str or list
        lam_concat=False, lam_concat_v=False,  # AutoMix: no lam cat for small-scale datasets
        lam_mul=False, lam_residual=False, lam_mul_k=-1,  # SAMix lam: none
        value_neck_cfg=None,  # SAMix: non-linear value
        x_qk_concat=False, x_v_concat=False,  # SAMix x concat: none
        # att_norm_cfg=dict(type='BN'),  # norm after q,k (design for fp16, also conduct better performace in fp32)
        mask_loss_mode="L1", mask_loss_margin=0.1,  # L1 loss, 0.1
        frozen=False),
    head_one=dict(
        type='ClsHead',  # default CE
        loss=dict(type='CrossEntropyLoss', use_soft=False, use_sigmoid=False, loss_weight=1.0),
        with_avg_pool=True, multi_label=False, in_channels=512, num_classes=200),
    head_mix=dict(  # backbone & mixblock
        type='ClsMixupHead',  # mixup, default CE
        loss=dict(type='CrossEntropyLoss', use_soft=False, use_sigmoid=False, loss_weight=1.0),
        with_avg_pool=True, multi_label=False, in_channels=512, num_classes=200),
    head_weights=dict(
        head_mix_q=1, head_one_q=1, head_mix_k=1, head_one_k=1),
)

if __name__ == '__main__':
    # build the model and load checkpoint
    from openmixup.models import build_model
    from mmcv.runner import load_checkpoint
    cfg_model = model_r18_automix
    checkpoint = './openmixup/work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth'
    model = build_model(cfg_model)
    load_checkpoint(model, checkpoint, map_location='cpu')

    print(model)

Is this code correct? Do you have a more concise way to load the model?

zeyuanyin commented 1 year ago

There is a concise way to load the model

  from mmcv import Config
  from mmcv.runner import load_checkpoint
  from openmixup.models import build_model

  cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py')
  model = build_model(cfg.model)
  load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')
Lupin1998 commented 1 year ago

There is a concise way to load the model

  from mmcv import Config
  from mmcv.runner import load_checkpoint
  from openmixup.models import build_model

  cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py')
  model = build_model(cfg.model)
  load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')

Hi @zeyuanyin, thanks for your question and the concise answer. The way you mentioned is also the official way we loading a full checkpoint as in tools/train.py. Feel free to ask me if you have more questions.

zeyuanyin commented 1 year ago

Thanks a lot for your confirmation @Lupin1998

I have other questions about the output of the pretrained model.

  print(model)

  model.cuda()
  import torch
  input = torch.randn(4, 3, 64, 64).cuda()
  input.requires_grad = True
  dummy_label = torch.tensor([1, 2, 3, 4]).cuda()

  print('----train')
  model.train()
  output = model(input, gt_label=dummy_label, return_loss=True)
  print(output)

  print('----eval')
  model.eval()
  output = model(input, gt_label=dummy_label, return_loss=True)
  print(output)

Output is

...
  (head_mix_q): ClsMixupHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_mix_k): ClsMixupHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_one_q): ClsHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
  (head_one_k): ClsHead(
    (criterion): CrossEntropyLoss()
    (fc): Linear(in_features=512, out_features=200, bias=True)
    (post_process): Softmax(dim=1)
  )
)
----train
{'loss': tensor(16.9432, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(27.7222, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
Lupin1998 commented 1 year ago

Thanks for your detailed questions and I answer them as follows:

I hope these will be helpful to you. 😄

zeyuanyin commented 1 year ago

Thanks for your very detailed reply. It seems not clear about my first question statement, we use CrossEntropy to calculate the loss, but I want to know whether the input will apply mixup in training mode and not in the inference mode and whether the used target is one-hot or mixed in different modes.

Test code is

print('----args inference model')
logits = model(input, gt_label=dummy_label, mode="inference")
assert logits.shape == (4, 200)
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, dummy_label)
print(loss)

print('----train')
model.train()
output = model(input, gt_label=dummy_label)
print(output)

print('----eval')
model.eval()
output = model(input, gt_label=dummy_label)
print(output)

The output is:

----args inference model
tensor(5.2924, device='cuda:0', grad_fn=<NllLossBackward0>)
----train
{'loss': tensor(16.6683, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(26.5577, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}

In my test result, the first loss calculated by logits and CrossEntropyLoss does not match the loss in model.eval model. There exists the difference among the loss calculation strategies.

Lupin1998 commented 1 year ago

As for the further question, mixup loss is applied to mode=train in the classification head (see ClsMixupHead class), and testing and inference modes will perform sample mixup. Please refer to the implementation of forward_train in AutoMixup for more details.

Lupin1998 commented 1 year ago

Hi, @zeyuanyin. I guess the above answers can be helpful to your questions. If you have further questions about mixup augmentations and AutoMix implementations, maybe we can discuss them through WeChat (finding me by Lupin_1998). I close this issue and you can open a new one for other questions.