Closed AmmaraRazzaq closed 1 year ago
Thanks for your good question! Could you provide a detailed log? It may be due to the position_embedding. It is absolute and depends on the input resolution.
For a different resolution, you need interpolation. Here are some demo code:
patch_size = 14 if 'l14' in backbone else 16
num_patches = (input_resolution // patch_size) ** 2
ori_num_patches, embedding_size = new_state_dict['backbone.positional_embedding'].shape
ori_num_patches -= 1
if num_patches != ori_num_patches:
logger.info(f'Interpolate pos_emb from {ori_num_patches} to {num_patches}')
weight = new_state_dict['backbone.positional_embedding']
orig_size = int(ori_num_patches ** 0.5)
new_size = int(num_patches ** 0.5)
extra_tokens = weight[:1]
pos_tokens = weight[1:]
pos_tokens = pos_tokens.reshape(1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
pos_tokens = torch.nn.functional.interpolate(
pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
pos_tokens = pos_tokens.permute(0, 2, 3, 1).reshape(num_patches, embedding_size)
new_state_dict['backbone.positional_embedding'] = torch.cat((extra_tokens, pos_tokens), dim=0)
You can add the code before loading state_dict.
Hi
yes I am getting the error due to positional embedding here
This is the error message I get
Exception has occurred: RuntimeError The size of tensor a (1025) must match the size of tensor b (197) at non-singleton dimension 1
Great. You can try to interpolate the weights followed my code.
As there is no more activity, I am closing the issue, don't hesitate to reopen it if necessary.
Hi. Thanks for this work ! I'm trying to do the same. I've rewritten your snippet thusly :
patch_size = 14 if 'l14' in backbone else 16
num_patches = (cfg.DATA.TRAIN_CROP_SIZE // patch_size) ** 2
ori_num_patches, embedding_size = state_dict['backbone.positional_embedding'].shape
ori_num_patches -= 1
if num_patches != ori_num_patches:
logger.info(f'Interpolate pos_emb from {ori_num_patches} to {num_patches}')
weight = state_dict['backbone.positional_embedding']
orig_size = int(ori_num_patches ** 0.5)
new_size = int(num_patches ** 0.5)
extra_tokens = weight[:1]
pos_tokens = weight[1:]
pos_tokens = pos_tokens.reshape(1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
pos_tokens = torch.nn.functional.interpolate(
pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
pos_tokens = pos_tokens.permute(0, 2, 3, 1).reshape(num_patches, embedding_size)
state_dict['backbone.positional_embedding'] = torch.cat((extra_tokens, pos_tokens), dim=0)
self.load_state_dict(state_dict, strict=False)
But I get this error :
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
fn(i, *args)
File "/workspace/slowfast/utils/multiprocessing.py", line 60, in run
ret = func(cfg)
File "/workspace/tools/train_net.py", line 418, in train
model = build_model(cfg)
File "/workspace/slowfast/models/build.py", line 42, in build_model
model = MODEL_REGISTRY.get(name)(cfg)
File "/workspace/slowfast/models/uniformerv2.py", line 103, in __init__
self.load_state_dict(state_dict, strict=False)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1918, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Uniformerv2:
size mismatch for backbone.positional_embedding: copying a param with shape torch.Size([65, 768]) from checkpoint, the shape in current model is torch.Size([197, 768]).
I'm using 132 size. I'm thinking this isn't the only place where something in the code should change. I'm still investigating. Can you help, please ?
I'm thinking I can do it by manually changing the input_resolution
value from 224
to my value in, and disabling the original VIT pretraining loading . Isn't this loading useless when we load from one of your provided checkpoints ? I haven't yet successfully launched the training.
I have successfully launched the training with these steps.
Sorry for the late reply and thanks for your try! You can reopen the issue if you meet some problems next time, thus I can reply to you in time!
Hi I am trying to retrain a model by loading it form checkpoint on soccernet dataset. when I change the DATA.TRAIN_CROP_SIZE from 224 to 512, it gives an error in the dimensions of the tensor. Why is that? and how can I fix it?