YuanGongND / cav-mae

Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
BSD 2-Clause "Simplified" License
214 stars 20 forks source link

Error when loading the CAV-MAE model #1

Open pelegshilo opened 1 year ago

pelegshilo commented 1 year ago

Hello,

I am trying to fine-tune CAV-MAE for an audio classification task, and I loaded the model according to the provided snippet. However, when I do so I get the following error:

`--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /tmp/ipykernel_27/285712233.py in 4 audio_model = CAVMAE(audio_length=1024, 5 modality_specific_depth=11, ----> 6 norm_pix_loss=True, tr_pos=False) 7 8 mdl_weight = torch.load(model_path, map_location=CFG.device)

/kaggle/working/cav-mae/src/models/cav_mae.py in init(self, img_size, audio_length, patch_size, in_chans, embed_dim, modality_specific_depth, num_heads, decoder_embed_dim, decoder_depth, decoder_num_heads, mlp_ratio, norm_layer, norm_pix_loss, tr_pos) 93 94 # audio-branch ---> 95 self.blocks_a = nn.ModuleList([Block(embed_dim, num_heads, mlp_ratio, qkv_bias=True, qk_scale=None, norm_layer=norm_layer) for i in range(modality_specific_depth)]) 96 # visual-branch 97 self.blocks_v = nn.ModuleList([Block(embed_dim, num_heads, mlp_ratio, qkv_bias=True, qk_scale=None, norm_layer=norm_layer) for i in range(modality_specific_depth)])

/kaggle/working/cav-mae/src/models/cav_mae.py in (.0) 93 94 # audio-branch ---> 95 self.blocks_a = nn.ModuleList([Block(embed_dim, num_heads, mlp_ratio, qkv_bias=True, qk_scale=None, norm_layer=norm_layer) for i in range(modality_specific_depth)]) 96 # visual-branch 97 self.blocks_v = nn.ModuleList([Block(embed_dim, num_heads, mlp_ratio, qkv_bias=True, qk_scale=None, norm_layer=norm_layer) for i in range(modality_specific_depth)])

/kaggle/working/cav-mae/src/models/cav_mae.py in init(self, dim, num_heads, mlp_ratio, qkv_bias, qk_scale, drop, attn_drop, drop_path, act_layer, norm_layer) 41 self.norm1_v = norm_layer(dim) 42 self.attn = Attention( ---> 43 dim, num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop) 44 # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here 45 self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

TypeError: init() got an unexpected keyword argument 'qk_scale'`

I get a similar error when trying to load the CAV-MAE-FT model for AudioSet.

YuanGongND commented 1 year ago

The bug is due to your Attention class being different from ours. Please check your timm package version, it should be timm==0.4.5.

-Yuan

YuanGongND commented 1 year ago

To do so, use

import timm
print(timm.__version__)

The easiest way to reproduce this repo is to install the same environment with us, we provide the package list at https://github.com/YuanGongND/cav-mae/blob/master/requirements_a5.txt.

In your virtual environment, use pip install -r requirements_a5.txt