Closed yg2024 closed 2 years ago
Hi @yuangui0316, thanks for your interest. In the current version, the size of the hidden dimensions of the meta-network for computing the positional encodings is 256. So loading the checkpoints should not lead to a shape mismatch. https://github.com/ChristophReich1996/Swin-Transformer-V2/blob/3c6a5e58c59afdd5b4f26c8af085a5a69120957e/swin_transformer_v2/model_parts.py#L109
And also I didn't have any issues loading the checkpoints. Could you please provide more details to reproduce this error?
But please be aware to use the correct input_resolution
and window_size
when loading the checkpoints. For the CIFAR10 dataset, the input resolution is 32 and the window size is 8. For that places365 dataset, the input resolution is 256 and the window size is 8.
window_attention.relative_coordinates_log: copying a param with shape torch.Size([256, 2]) from checkpoint, the shape in current model is torch.Size([4096, 2]).