OpenGVLab / UniFormerV2

[ICCV2023] UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
https://arxiv.org/abs/2211.09552
Apache License 2.0
291 stars 18 forks source link

DATA.TRAIN_CROP_SIZE gives an error #12

Closed AmmaraRazzaq closed 1 year ago

AmmaraRazzaq commented 1 year ago

Hi I am trying to retrain a model by loading it form checkpoint on soccernet dataset. when I change the DATA.TRAIN_CROP_SIZE from 224 to 512, it gives an error in the dimensions of the tensor. Why is that? and how can I fix it?

Andy1621 commented 1 year ago

Thanks for your good question! Could you provide a detailed log? It may be due to the position_embedding. It is absolute and depends on the input resolution.

For a different resolution, you need interpolation. Here are some demo code:

    patch_size = 14 if 'l14' in backbone else 16
    num_patches = (input_resolution // patch_size) ** 2
    ori_num_patches, embedding_size = new_state_dict['backbone.positional_embedding'].shape
    ori_num_patches -= 1
    if num_patches != ori_num_patches:
        logger.info(f'Interpolate pos_emb from {ori_num_patches} to {num_patches}')
        weight = new_state_dict['backbone.positional_embedding']
        orig_size = int(ori_num_patches ** 0.5)
        new_size = int(num_patches ** 0.5)
        extra_tokens = weight[:1]
        pos_tokens = weight[1:]
        pos_tokens = pos_tokens.reshape(1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
        pos_tokens = torch.nn.functional.interpolate(
            pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
        pos_tokens = pos_tokens.permute(0, 2, 3, 1).reshape(num_patches, embedding_size)
        new_state_dict['backbone.positional_embedding'] = torch.cat((extra_tokens, pos_tokens), dim=0)

You can add the code before loading state_dict.

AmmaraRazzaq commented 1 year ago

Hi

yes I am getting the error due to positional embedding here

This is the error message I get

Exception has occurred: RuntimeError The size of tensor a (1025) must match the size of tensor b (197) at non-singleton dimension 1

Andy1621 commented 1 year ago

Great. You can try to interpolate the weights followed my code.

Andy1621 commented 1 year ago

As there is no more activity, I am closing the issue, don't hesitate to reopen it if necessary.

dmenig commented 1 year ago

Hi. Thanks for this work ! I'm trying to do the same. I've rewritten your snippet thusly :

            patch_size = 14 if 'l14' in backbone else 16
            num_patches = (cfg.DATA.TRAIN_CROP_SIZE // patch_size) ** 2
            ori_num_patches, embedding_size = state_dict['backbone.positional_embedding'].shape
            ori_num_patches -= 1
            if num_patches != ori_num_patches:
                logger.info(f'Interpolate pos_emb from {ori_num_patches} to {num_patches}')
                weight = state_dict['backbone.positional_embedding']
                orig_size = int(ori_num_patches ** 0.5)
                new_size = int(num_patches ** 0.5)
                extra_tokens = weight[:1]
                pos_tokens = weight[1:]
                pos_tokens = pos_tokens.reshape(1, orig_size, orig_size, embedding_size).permute(0, 3, 1, 2)
                pos_tokens = torch.nn.functional.interpolate(
                    pos_tokens, size=(new_size, new_size), mode='bicubic', align_corners=False)
                pos_tokens = pos_tokens.permute(0, 2, 3, 1).reshape(num_patches, embedding_size)
                state_dict['backbone.positional_embedding'] = torch.cat((extra_tokens, pos_tokens), dim=0)
            self.load_state_dict(state_dict, strict=False)

But I get this error :

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/workspace/slowfast/utils/multiprocessing.py", line 60, in run
    ret = func(cfg)
  File "/workspace/tools/train_net.py", line 418, in train
    model = build_model(cfg)
  File "/workspace/slowfast/models/build.py", line 42, in build_model
    model = MODEL_REGISTRY.get(name)(cfg)
  File "/workspace/slowfast/models/uniformerv2.py", line 103, in __init__
    self.load_state_dict(state_dict, strict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1918, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Uniformerv2:
        size mismatch for backbone.positional_embedding: copying a param with shape torch.Size([65, 768]) from checkpoint, the shape in current model is torch.Size([197, 768]).

I'm using 132 size. I'm thinking this isn't the only place where something in the code should change. I'm still investigating. Can you help, please ?

dmenig commented 1 year ago

I'm thinking I can do it by manually changing the input_resolution value from 224 to my value in, and disabling the original VIT pretraining loading . Isn't this loading useless when we load from one of your provided checkpoints ? I haven't yet successfully launched the training.

dmenig commented 1 year ago

I have successfully launched the training with these steps.

Andy1621 commented 1 year ago

Sorry for the late reply and thanks for your try! You can reopen the issue if you meet some problems next time, thus I can reply to you in time!