Closed zeyuanyin closed 1 year ago
There is a concise way to load the model
from mmcv import Config
from mmcv.runner import load_checkpoint
from openmixup.models import build_model
cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py')
model = build_model(cfg.model)
load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')
There is a concise way to load the model
from mmcv import Config from mmcv.runner import load_checkpoint from openmixup.models import build_model cfg = Config.fromfile('configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py') model = build_model(cfg.model) load_checkpoint(model, 'work_dirs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2/epoch_400.pth', map_location='cpu')
Hi @zeyuanyin, thanks for your question and the concise answer. The way you mentioned is also the official way we loading a full checkpoint as in tools/train.py. Feel free to ask me if you have more questions.
Thanks a lot for your confirmation @Lupin1998
I have other questions about the output of the pretrained model.
print(model)
model.cuda()
import torch
input = torch.randn(4, 3, 64, 64).cuda()
input.requires_grad = True
dummy_label = torch.tensor([1, 2, 3, 4]).cuda()
print('----train')
model.train()
output = model(input, gt_label=dummy_label, return_loss=True)
print(output)
print('----eval')
model.eval()
output = model(input, gt_label=dummy_label, return_loss=True)
print(output)
Output is
...
(head_mix_q): ClsMixupHead(
(criterion): CrossEntropyLoss()
(fc): Linear(in_features=512, out_features=200, bias=True)
(post_process): Softmax(dim=1)
)
(head_mix_k): ClsMixupHead(
(criterion): CrossEntropyLoss()
(fc): Linear(in_features=512, out_features=200, bias=True)
(post_process): Softmax(dim=1)
)
(head_one_q): ClsHead(
(criterion): CrossEntropyLoss()
(fc): Linear(in_features=512, out_features=200, bias=True)
(post_process): Softmax(dim=1)
)
(head_one_k): ClsHead(
(criterion): CrossEntropyLoss()
(fc): Linear(in_features=512, out_features=200, bias=True)
(post_process): Softmax(dim=1)
)
)
----train
{'loss': tensor(16.9432, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(27.7222, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
acc_mix_q
, acc_one_q
,acc_mix_k
?Thanks for your detailed questions and I answer them as follows:
As for the classification loss, you should use model.train()
during training while using model.eval()
for testing. The main difference between train()
and eval()
is the behaviours of BN and Dropout (e.g., refer to https://zhuanlan.zhihu.com/p/357075502 for detailed explanations).
You can choose the mode
of the classification model with mode=xxx
(refer to base_model.py
), e.g., get the logits directly with mode="inference"
.
output = model(input, gt_label=dummy_label, mode="inference") # the logit of the first head
Meanwhile, the basic method of AutoMix
variants (using AutoMixup
class) is different from other Mixup methods (using MixupClassification
class). forward_inference
returns the logits of the first classification head while AutoMixup
should output the logits of four classification heads. We report the better results among them (usually acc_mix_k
) for AutoMix experiments using find_automix_val_median
. You might rewrite the forward_inference
for AutoMixup
to get the logits of four heads.
As for a normal classifier using MixupClassification
class, the testing results (mode="test"
) will be "head0_top1": xxx
, where head0
denotes the first classification head and topX
denotes the top-X accuracy. As for the AutoMix classifier using AutoMixup
class, the output will be acc_mix_q_topX
, acc_mix_k_topX
, acc_one_q_topX
, and acc_one_k_topX
, which means the top-X accuracy of the certain head (mix
denotes the mixup head and one
denotes the one-hot head). Please refer to the paper of AutoMix for details.
I hope these will be helpful to you. 😄
Thanks for your very detailed reply. It seems not clear about my first question statement, we use CrossEntropy to calculate the loss, but I want to know whether the input will apply mixup in training mode and not in the inference mode and whether the used target is one-hot or mixed in different modes.
Test code is
print('----args inference model')
logits = model(input, gt_label=dummy_label, mode="inference")
assert logits.shape == (4, 200)
criterion = nn.CrossEntropyLoss()
loss = criterion(logits, dummy_label)
print(loss)
print('----train')
model.train()
output = model(input, gt_label=dummy_label)
print(output)
print('----eval')
model.eval()
output = model(input, gt_label=dummy_label)
print(output)
The output is:
----args inference model
tensor(5.2924, device='cuda:0', grad_fn=<NllLossBackward0>)
----train
{'loss': tensor(16.6683, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
----eval
{'loss': tensor(26.5577, device='cuda:0', grad_fn=<AddBackward0>), 'acc_mix_q': tensor([0.], device='cuda:0'), 'acc_one_q': tensor([0.], device='cuda:0'), 'acc_mix_k': tensor([0.], device='cuda:0')}
In my test result, the first loss calculated by logits and CrossEntropyLoss does not match the loss in model.eval model. There exists the difference among the loss calculation strategies.
As for the further question, mixup loss is applied to mode=train
in the classification head (see ClsMixupHead
class), and testing and inference modes will perform sample mixup. Please refer to the implementation of forward_train
in AutoMixup for more details.
Hi, @zeyuanyin. I guess the above answers can be helpful to your questions. If you have further questions about mixup augmentations and AutoMix implementations, maybe we can discuss them through WeChat (finding me by Lupin_1998
). I close this issue and you can open a new one for other questions.
If I use
bash tools/dist_train.sh configs/classification/tiny_imagenet/automix/basic/r18_l2_a2_near_mb_mlr1e_3_bb_mlr5e_2.py 1
to get the trained weightepoch_400.pth
, how to loadepoch_400.pth
into the resnet18 model for the other task?Is this code correct? Do you have a more concise way to load the model?