Open ygfrancois opened 3 years ago
Please check that you are referencing to the correct pretrained checkpoint. I just downloaded the TimeSformer_divST_16x16_448_K600.pyth model, and checked that its 'time_embed' weight is of size: [1, 16, 768]. Therefore, you shouldn't be having these issues.
Please check that you are referencing to the correct pretrained checkpoint. I just downloaded the TimeSformer_divST_16x16_448_K600.pyth model, and checked that its 'time_embed' weight is of size: [1, 16, 768]. Therefore, you shouldn't be having these issues.
thanks for reply, i'll try second time
Hi, have you ever solved the problem? I have the same problem when loading the model "TimeSformer_divST_16x16_448_K400.pyth?dl=0", getting the error size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768]).
Hi, Have you solved this issue? Same issue here when downloading TimeSformer_divST_16x16_448_K400.pyth?dl=0:
size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768])
Also, when downloaded TimeSformer_divST_32x32_224_HowTo100M.pyth: size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 32, 768]).
yaml and Log file: 'PRETRAINED_MODEL': '.... /timesformer/models/TimeSformer_divST_32x32_224_HowTo100M.pyth'},
I found the issue. There is a bug in vit.py , in the init of vit_base_patch16_224(nn.Module): Used to be 2 bugs:
Old line: load_pretrained(self.model, num_classes=self.model.num_classes, in_chans=kwargs.get('in_chans', 3), filter_fn=_conv_filter, img_size=cfg.DATA.TRAIN_CROP_SIZE, num_patches=self.num_patches, attention_type=self.attention_type, pretrained_model=pretrained_model)
Need to fix to:
load_pretrained(self.model, num_classes=self.model.num_classes, in_chans**=cfg.DATA.INPUT_CHANNEL_NUM[0]**, filter_fn=_conv_filter, img_size=cfg.DATA.TRAIN_CROP_SIZE, **num_frames=cfg.DATA.NUM_FRAMES**, num_patches=self.num_patches, attention_type=self.attention_type, pretrained_model=pretrained_model)
training with TIMESFORMER.PRETRAINED_MODEL=TimeSformer_divST_16x16_448_K600.pyth, and --cfg configs/kinetics/TimeSformer_divST_16x16_448.yaml, but get the error below:
size mismatch for time_embed: copying a param with shape torch.Size([1, 8, 768]) from checkpoint, the shape in current model is torch.Size([1, 16, 768]).