ChristophReich1996 / Swin-Transformer-V2

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" [CVPR 2022].
https://arxiv.org/abs/2111.09883
MIT License
173 stars 14 forks source link

relative_coordinates_log #7

Closed yg2024 closed 2 years ago

yg2024 commented 2 years ago

window_attention.relative_coordinates_log: copying a param with shape torch.Size([256, 2]) from checkpoint, the shape in current model is torch.Size([4096, 2]).

ChristophReich1996 commented 2 years ago

Hi @yuangui0316, thanks for your interest. In the current version, the size of the hidden dimensions of the meta-network for computing the positional encodings is 256. So loading the checkpoints should not lead to a shape mismatch. https://github.com/ChristophReich1996/Swin-Transformer-V2/blob/3c6a5e58c59afdd5b4f26c8af085a5a69120957e/swin_transformer_v2/model_parts.py#L109 And also I didn't have any issues loading the checkpoints. Could you please provide more details to reproduce this error? But please be aware to use the correct input_resolution and window_size when loading the checkpoints. For the CIFAR10 dataset, the input resolution is 32 and the window size is 8. For that places365 dataset, the input resolution is 256 and the window size is 8.