facebookresearch / TimeSformer

The official pytorch implementation of our paper "Is Space-Time Attention All You Need for Video Understanding?"
Other
1.55k stars 212 forks source link

pretrained weights of TimeSformerHR-K600 only 8frames? #44

Open ygfrancois opened 3 years ago

ygfrancois commented 3 years ago

training with TIMESFORMER.PRETRAINED_MODEL=TimeSformer_divST_16x16_448_K600.pyth, and --cfg configs/kinetics/TimeSformer_divST_16x16_448.yaml, but get the error below:

size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768]).

gberta commented 3 years ago

Please check that you are referencing to the correct pretrained checkpoint. I just downloaded the TimeSformer_divST_16x16_448_K600.pyth model, and checked that its 'time_embed' weight is of size: [1, 16, 768]. Therefore, you shouldn't be having these issues.

ygfrancois commented 3 years ago

Please check that you are referencing to the correct pretrained checkpoint. I just downloaded the TimeSformer_divST_16x16_448_K600.pyth model, and checked that its 'time_embed' weight is of size: [1, 16, 768]. Therefore, you shouldn't be having these issues.

thanks for reply, i'll try second time

wrld commented 2 years ago

Hi, have you ever solved the problem? I have the same problem when loading the model "TimeSformer_divST_16x16_448_K400.pyth?dl=0", getting the error size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768]).

amitgurintelcom commented 2 years ago

Hi, Have you solved this issue? Same issue here when downloading TimeSformer_divST_16x16_448_K400.pyth?dl=0:

size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768])

Also, when downloaded TimeSformer_divST_32x32_224_HowTo100M.pyth: size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 32, 768]).

yaml and Log file: 'PRETRAINED_MODEL': '.... /timesformer/models/TimeSformer_divST_32x32_224_HowTo100M.pyth'},

amitgurintelcom commented 2 years ago

I found the issue. There is a bug in vit.py , in the init of vit_base_patch16_224(nn.Module): Used to be 2 bugs:

  1. Did not send the param on num_frames
  2. Did not update the number of input channels

Old line: load_pretrained(self.model, num_classes=self.model.num_classes, in_chans=kwargs.get('in_chans', 3), filter_fn=_conv_filter, img_size=cfg.DATA.TRAIN_CROP_SIZE, num_patches=self.num_patches, attention_type=self.attention_type, pretrained_model=pretrained_model)

Need to fix to:

        load_pretrained(self.model, num_classes=self.model.num_classes, in_chans**=cfg.DATA.INPUT_CHANNEL_NUM[0]**, filter_fn=_conv_filter, img_size=cfg.DATA.TRAIN_CROP_SIZE, **num_frames=cfg.DATA.NUM_FRAMES**, num_patches=self.num_patches, attention_type=self.attention_type, pretrained_model=pretrained_model)