232525 / PureT

Implementation of 'End-to-End Transformer Based Model for Image Captioning' [AAAI 2022]
63 stars 12 forks source link

About pretrained models differences #8

Open yanjialele opened 2 years ago

yanjialele commented 2 years ago

Author, thank you for your work. Here I would like to ask, is there any difference between the pre-training model you used in the stage of image feature extraction (swin_large_patch4_window12_384_22kto1k_no_head.pth) and the model provided by Swin-Transformer after training on imagenet-1K(swin_large_patch4_window12_384_22kto1k.pth)? I observed that the size of the two models is different. How did you get your pre-training model?

Below is the pre-training model of swin-Transformer that you pointed out for download

image

232525 commented 2 years ago

The model of swin_large_patch4_window12_384_22kto1k_no_head.pth is come from the official pre-trained SwinTransformer model of swin_large_patch4_window12_384_22kto1k.pth. The feature extraction does not need the final classification layer, so we delete the weights of the final Linear layer:

import torch
from collections import OrderedDict

model_path2 = './swin_large_patch4_window12_384_22kto1k.pth'
weights2 = torch.load(model_path2, 'cpu')

swin_weights = OrderedDict()
for key in weights2['model'].keys():
    if 'head' in key:
        continue
    new_layer_name = key
    swin_weights[new_layer_name] = weights2['model'][key]

torch.save(swin_weights, './swin_large_patch4_window12_384_22kto1k_no_head.pth')
yanjialele commented 2 years ago

I have learned,Thank you for your reply!