OpenGVLab / VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding
https://arxiv.org/abs/2403.06977
Apache License 2.0
801 stars 60 forks source link

如何加载imagenet1k的预训练权重? #88

Open zhending111 opened 3 weeks ago

zhending111 commented 3 weeks ago

您好,我想请问为什么我用videomamba作为backbone加载in1k的预训练(videomamba_m16_in1k_res224)权重显示 load pretrained weights _IncompatibleKeys(missing_keys=['cls_token', 'pos_embed', 'temporal_pos_embedding', 'patch_embed.proj.weight', 'patch_embed.proj.bias', 'head.weight', 'head.bias', 'layers.0.mixer.A_log', 'layers.0.mixer.D', 'layers.0.mixer.A_b_log', 'layers.0.mixer.D_b', 'layers.0.mixer.in_proj.weight', 'layers.0.mixer.conv1d.weight', 'layers.0.mixer.conv1d.bias', 'layers.0.mixer.x_proj.weight', 'layers.0.mixer.dt_proj.weight', 'layers.0.mixer.dt_proj.bias', 'layers.0.mixer.conv1d_b.weight', 'layers.0.mixer.conv1d_b.bias', 'layers.0.mixer.x_proj_b.weight', 'layers.0.mixer.dt_proj_b.weight', 'layers.0.mixer.dt_proj_b.bias', 'layers.0.mixer.out_proj.weight', 'layers.0.norm.weight', 'layers.1.mixer.A_log', 'layers.1.mixer.D', 'layers.1.mixer.A_b_log', 'layers.1.mixer.D_b', 'layers.1.mixer.in_proj.weight', 'layers.1.mixer.conv1d.weight', 'layers.1.mixer.conv1d.bias', 'layers.1.mixer.x_proj.weight', 'layers.1.mixer.dt_proj.weight', 'layers.1.mixer.dt_proj.bias', 'layers.1.mixer.conv1d_b.weight', 'layers.1.mixer.conv1d_b.bias', 'layers.1.mixer.x_proj_b.weight', 'layers.1.mixer.dt_proj_b.weight', 'layers.1.mixer.dt_proj_b.bias', 'layers.1.mixer.out_proj.weight', 'layers.1.norm.weight', 'layers.2.mixer.A_log', 'layers.2.mixer.D', 'layers.2.mixer.A_b_log', 'layers.2.mixer.D_b', 'layers.2.mixer.in_proj.weight', 'layers.2.mixer.conv1d.weight', 'layers.2.mixer.conv1d.bias', 'layers.2.mixer.x_proj.weight', 'layers.2.mixer.dt_proj.weight', 'layers.2.mixer.dt_proj.bias', 'layers.2.mixer.conv1d_b.weight', 'layers.2.mixer.conv1d_b.bias', 'layers.2.mixer.x_proj_b.weight', 'layers.2.mixer.dt_proj_b.weight', 'layers.2.mixer.dt_proj_b.bias', 'layers.2.mixer.out_proj.weight', 'layers.2.norm.weight', 'layers.3.mixer.A_log', 'layers.3.mixer.D', 'layers.3.mixer.A_b_log', 'layers.3.mixer.D_b', 'layers.3.mixer.in_proj.weight', 'layers.3.mixer.conv1d.weight', 'layers.3.mixer.conv1d.bias', 'layers.3.mixer.x_proj.weight', 'layers.3.mixer.dt_proj.weight', 'layers.3.mixer.dt_proj.bias', 'layers.3.mixer.conv1d_b.weight', 'layers.3.mixer.conv1d_b.bias', 'layers.3.mixer.x_proj_b.weight', 'layers.3.mixer.dt_proj_b.weight', 'layers.3.mixer.dt_proj_b.bias', 'layers.3.mixer.out_proj.weight', 'layers.3.norm.weight', 'layers.4.mixer.A_log', 'layers.4.mixer.D', 'layers.4.mixer.A_b_log', 'layers.4.mixer.D_b', 'layers.4.mixer.in_proj.weight', 'layers.4.mixer.conv1d.weight', 'layers.4.mixer.conv1d.bias', 'layers.4.mixer.x_proj.weight', 'layers.4.mixer.dt_proj.weight', 'layers.4.mixer.dt_proj.bias', 'layers.4.mixer.conv1d_b.weight', 'layers.4.mixer.conv1d_b.bias', 'layers.4.mixer.x_proj_b.weight', 'layers.4.mixer.dt_proj_b.weight', 'layers.4.mixer.dt_proj_b.bias', 'layers.4.mixer.out_proj.weight', 'layers.4.norm.weight', 'layers.5.mixer.A_log', 'layers.5.mixer.D', 'layers.5.mixer.A_b_log', 'layers.5.mixer.D_b', 'layers.5.mixer.in_proj.weight', 'layers.5.mixer.conv1d.weight', 'layers.5.mixer.conv1d.bias', 'layers.5.mixer.x_proj.weight', 'layers.5.mixer.dt_proj.weight', 'layers.5.mixer.dt_proj.bias', 'layers.5.mixer.conv1d_b.weight', 'layers.5.mixer.conv1d_b.bias', 'layers.5.mixer.x_proj_b.weight', 'layers.5.mixer.dt_proj_b.weight', 'layers.5.mixer.dt_proj_b.bias', 'layers.5.mixer.out_proj.weight', 'layers.5.norm.weight', 'layers.6.mixer.A_log', 'layers.6.mixer.D', 'layers.6.mixer.A_b_log', 'layers.6.mixer.D_b', 'layers.6.mixer.in_proj.weight', 'layers.6.mixer.conv1d.weight', 'layers.6.mixer.conv1d.bias', 'layers.6.mixer.x_proj.weight', 'layers.6.mixer.dt_proj.weight', 'layers.6.mixer.dt_proj.bias', 'layers.6.mixer.conv1d_b.weight', 'layers.6.mixer.conv1d_b.bias', 'layers.6.mixer.x_proj_b.weight', 'layers.6.mixer.dt_proj_b.weight', 'layers.6.mixer.dt_proj_b.bias', 'layers.6.mixer.out_proj.weight', 'layers.6.norm.weight', 'layers.7.mixer.A_log', 'layers.7.mixer.D', 'layers.7.mixer.A_b_log', 'layers.7.mixer.D_b', 'layers.7.mixer.in_proj.weight', 'layers.7.mixer.conv1d.weight', 'layers.7.mixer.conv1d.bias', 'layers.7.mixer.x_proj.weight', 'layers.7.mixer.dt_proj.weight', 'layers.7.mixer.dt_proj.bias', 'layers.7.mixer.conv1d_b.weight', 'layers.7.mixer.conv1d_b.bias', 'layers.7.mixer.x_proj_b.weight', 'layers.7.mixer.dt_proj_b.weight', 'layers.7.mixer.dt_proj_b.bias', 'layers.7.mixer.out_proj.weight', 'layers.7.norm.weight', 'layers.8.mixer.A_log', 'layers.8.mixer.D', 'layers.8.mixer.A_b_log', 'layers.8.mixer.D_b', 'layers.8.mixer.in_proj.weight', 'layers.8.mixer.conv1d.weight', 'layers.8.mixer.conv1d.bias', 'layers.8.mixer.x_proj.weight', 'layers.8.mixer.dt_proj.weight', 'layers.8.mixer.dt_proj.bias', 'layers.8.mixer.conv1d_b.weight', 'layers.8.mixer.conv1d_b.bias', 'layers.8.mixer.x_proj_b.weight', 'layers.8.mixer.dt_proj_b.weight', 'layers.8.mixer.dt_proj_b.bias', 'layers.8.mixer.out_proj.weight', 'layers.8.norm.weight', 'layers.9.mixer.A_log', 'layers.9.mixer.D', 'layers.9.mixer.A_b_log', 'layers.9.mixer.D_b', 'layers.9.mixer.in_proj.weight', 'layers.9.mixer.conv1d.weight', 'layers.9.mixer.conv1d.bias', 'layers.9.mixer.x_proj.weight', 'layers.9.mixer.dt_proj.weight', 'layers.9.mixer.dt_proj.bias', 'layers.9.mixer.conv1d_b.weight', 'layers.9.mixer.conv1d_b.bias', 'layers.9.mixer.x_proj_b.weight', 'layers.9.mixer.dt_proj_b.weight', 'layers.9.mixer.dt_proj_b.bias', 'layers.9.mixer.out_proj.weight', 'layers.9.norm.weight', 'layers.10.mixer.A_log', 'layers.10.mixer.D', 'layers.10.mixer.A_b_log', 'layers.10.mixer.D_b', 'layers.10.mixer.in_proj.weight', 'layers.10.mixer.conv1d.weight', 'layers.10.mixer.conv1d.bias', 'layers.10.mixer.x_proj.weight', 'layers.10.mixer.dt_proj.weight', 'layers.10.mixer.dt_proj.bias', 'layers.10.mixer.conv1d_b.weight', 'layers.10.mixer.conv1d_b.bias', 'layers.10.mixer.x_proj_b.weight', 'layers.10.mixer.dt_proj_b.weight', 'layers.10.mixer.dt_proj_b.bias', 'layers.10.mixer.out_proj.weight', 'layers.10.norm.weight', 'layers.11.mixer.A_log', 'layers.11.mixer.D', 'layers.11.mixer.A_b_log', 'layers.11.mixer.D_b', 'layers.11.mixer.in_proj.weight', 'layers.11.mixer.conv1d.weight', 'layers.11.mixer.conv1d.bias', 'layers.11.mixer.x_proj.weight', 'layers.11.mixer.dt_proj.weight', 'layers.11.mixer.dt_proj.bias', 'layers.11.mixer.conv1d_b.weight', 'layers.11.mixer.conv1d_b.bias', 'layers.11.mixer.x_proj_b.weight', 'layers.11.mixer.dt_proj_b.weight', 'layers.11.mixer.dt_proj_b.bias', 'layers.11.mixer.out_proj.weight', 'layers.11.norm.weight', 'layers.12.mixer.A_log', 'layers.12.mixer.D', 'layers.12.mixer.A_b_log', 'layers.12.mixer.D_b', 'layers.12.mixer.in_proj.weight', 'layers.12.mixer.conv1d.weight', 'layers.12.mixer.conv1d.bias', 'layers.12.mixer.x_proj.weight', 'layers.12.mixer.dt_proj.weight', 'layers.12.mixer.dt_proj.bias', 'layers.12.mixer.conv1d_b.weight', 'layers.12.mixer.conv1d_b.bias', 'layers.12.mixer.x_proj_b.weight', 'layers.12.mixer.dt_proj_b.weight', 'layers.12.mixer.dt_proj_b.bias', 'layers.12.mixer.out_proj.weight', 'layers.12.norm.weight', 'layers.13.mixer.A_log', 'layers.13.mixer.D', 'layers.13.mixer.A_b_log', 'layers.13.mixer.D_b', 'layers.13.mixer.in_proj.weight', 'layers.13.mixer.conv1d.weight', 'layers.13.mixer.conv1d.bias', 'layers.13.mixer.x_proj.weight', 'layers.13.mixer.dt_proj.weight', 'layers.13.mixer.dt_proj.bias', 'layers.13.mixer.conv1d_b.weight', 'layers.13.mixer.conv1d_b.bias', 'layers.13.mixer.x_proj_b.weight', 'layers.13.mixer.dt_proj_b.weight', 'layers.13.mixer.dt_proj_b.bias', 'layers.13.mixer.out_proj.weight', 'layers.13.norm.weight', 'layers.14.mixer.A_log', 'layers.14.mixer.D', 'layers.14.mixer.A_b_log', 'layers.14.mixer.D_b', 'layers.14.mixer.in_proj.weight', 'layers.14.mixer.conv1d.weight', 'layers.14.mixer.conv1d.bias', 'layers.14.mixer.x_proj.weight', 'layers.14.mixer.dt_proj.weight', 'layers.14.mixer.dt_proj.bias', 'layers.14.mixer.conv1d_b.weight', 'layers.14.mixer.conv1d_b.bias', 'layers.14.mixer.x_proj_b.weight', 'layers.14.mixer.dt_proj_b.weight', 'layers.14.mixer.dt_proj_b.bias', 'layers.14.mixer.out_proj.weight', 'layers.14.norm.weight', 'layers.15.mixer.A_log', 'layers.15.mixer.D', 'layers.15.mixer.A_b_log', 'layers.15.mixer.D_b', 'layers.15.mixer.in_proj.weight', 'layers.15.mixer.conv1d.weight', 'layers.15.mixer.conv1d.bias', 'layers.15.mixer.x_proj.weight', 'layers.15.mixer.dt_proj.weight', 'layers.15.mixer.dt_proj.bias', 'layers.15.mixer.conv1d_b.weight', 'layers.15.mixer.conv1d_b.bias', 'layers.15.mixer.x_proj_b.weight', 'layers.15.mixer.dt_proj_b.weight', 'layers.15.mixer.dt_proj_b.bias', 'layers.15.mixer.out_proj.weight', 'layers.15.norm.weight', 'layers.16.mixer.A_log', 'layers.16.mixer.D', 'layers.16.mixer.A_b_log', 'layers.16.mixer.D_b', 'layers.16.mixer.in_proj.weight', 'layers.16.mixer.conv1d.weight', 'layers.16.mixer.conv1d.bias', 'layers.16.mixer.x_proj.weight', 'layers.16.mixer.dt_proj.weight', 'layers.16.mixer.dt_proj.bias', 'layers.16.mixer.conv1d_b.weight', 'layers.16.mixer.conv1d_b.bias', 'layers.16.mixer.x_proj_b.weight', 'layers.16.mixer.dt_proj_b.weight', 'layers.16.mixer.dt_proj_b.bias', 'layers.16.mixer.out_proj.weight', 'layers.16.norm.weight', 'layers.17.mixer.A_log', 'layers.17.mixer.D', 'layers.17.mixer.A_b_log', 'layers.17.mixer.D_b', 'layers.17.mixer.in_proj.weight', 'layers.17.mixer.conv1d.weight', 'layers.17.mixer.conv1d.bias', 'layers.17.mixer.x_proj.weight', 'layers.17.mixer.dt_proj.weight', 'layers.17.mixer.dt_proj.bias', 'layers.17.mixer.conv1d_b.weight', 'layers.17.mixer.conv1d_b.bias', 'layers.17.mixer.x_proj_b.weight', 'layers.17.mixer.dt_proj_b.weight', 'layers.17.mixer.dt_proj_b.bias', 'layers.17.mixer.out_proj.weight', 'layers.17.norm.weight', 'layers.18.mixer.A_log', 'layers.18.mixer.D', 'layers.18.mixer.A_b_log', 'layers.18.mixer.D_b', 'layers.18.mixer.in_proj.weight', 'layers.18.mixer.conv1d.weight', 'layers.18.mixer.conv1d.bias', 'layers.18.mixer.x_proj.weight', 'layers.18.mixer.dt_proj.weight', 'layers.18.mixer.dt_proj.bias', 'layers.18.mixer.conv1d_b.weight', 'layers.18.mixer.conv1d_b.bias', 'layers.18.mixer.x_proj_b.weight', 'layers.18.mixer.dt_proj_b.weight', 'layers.18.mixer.dt_proj_b.bias', 'layers.18.mixer.out_proj.weight', 'layers.18.norm.weight', 'layers.19.mixer.A_log', 'layers.19.mixer.D', 'layers.19.mixer.A_b_log', 'layers.19.mixer.D_b', 'layers.19.mixer.in_proj.weight', 'layers.19.mixer.conv1d.weight', 'layers.19.mixer.conv1d.bias', 'layers.19.mixer.x_proj.weight', 'layers.19.mixer.dt_proj.weight', 'layers.19.mixer.dt_proj.bias', 'layers.19.mixer.conv1d_b.weight', 'layers.19.mixer.conv1d_b.bias', 'layers.19.mixer.x_proj_b.weight', 'layers.19.mixer.dt_proj_b.weight', 'layers.19.mixer.dt_proj_b.bias', 'layers.19.mixer.out_proj.weight', 'layers.19.norm.weight', 'layers.20.mixer.A_log', 'layers.20.mixer.D', 'layers.20.mixer.A_b_log', 'layers.20.mixer.D_b', 'layers.20.mixer.in_proj.weight', 'layers.20.mixer.conv1d.weight', 'layers.20.mixer.conv1d.bias', 'layers.20.mixer.x_proj.weight', 'layers.20.mixer.dt_proj.weight', 'layers.20.mixer.dt_proj.bias', 'layers.20.mixer.conv1d_b.weight', 'layers.20.mixer.conv1d_b.bias', 'layers.20.mixer.x_proj_b.weight', 'layers.20.mixer.dt_proj_b.weight', 'layers.20.mixer.dt_proj_b.bias', 'layers.20.mixer.out_proj.weight', 'layers.20.norm.weight', 'layers.21.mixer.A_log', 'layers.21.mixer.D', 'layers.21.mixer.A_b_log', 'layers.21.mixer.D_b', 'layers.21.mixer.in_proj.weight', 'layers.21.mixer.conv1d.weight', 'layers.21.mixer.conv1d.bias', 'layers.21.mixer.x_proj.weight', 'layers.21.mixer.dt_proj.weight', 'layers.21.mixer.dt_proj.bias', 'layers.21.mixer.conv1d_b.weight', 'layers.21.mixer.conv1d_b.bias', 'layers.21.mixer.x_proj_b.weight', 'layers.21.mixer.dt_proj_b.weight', 'layers.21.mixer.dt_proj_b.bias', 'layers.21.mixer.out_proj.weight', 'layers.21.norm.weight', 'layers.22.mixer.A_log', 'layers.22.mixer.D', 'layers.22.mixer.A_b_log', 'layers.22.mixer.D_b', 'layers.22.mixer.in_proj.weight', 'layers.22.mixer.conv1d.weight', 'layers.22.mixer.conv1d.bias', 'layers.22.mixer.x_proj.weight', 'layers.22.mixer.dt_proj.weight', 'layers.22.mixer.dt_proj.bias', 'layers.22.mixer.conv1d_b.weight', 'layers.22.mixer.conv1d_b.bias', 'layers.22.mixer.x_proj_b.weight', 'layers.22.mixer.dt_proj_b.weight', 'layers.22.mixer.dt_proj_b.bias', 'layers.22.mixer.out_proj.weight', 'layers.22.norm.weight', 'layers.23.mixer.A_log', 'layers.23.mixer.D', 'layers.23.mixer.A_b_log', 'layers.23.mixer.D_b', 'layers.23.mixer.in_proj.weight', 'layers.23.mixer.conv1d.weight', 'layers.23.mixer.conv1d.bias', 'layers.23.mixer.x_proj.weight', 'layers.23.mixer.dt_proj.weight', 'layers.23.mixer.dt_proj.bias', 'layers.23.mixer.conv1d_b.weight', 'layers.23.mixer.conv1d_b.bias', 'layers.23.mixer.x_proj_b.weight', 'layers.23.mixer.dt_proj_b.weight', 'layers.23.mixer.dt_proj_b.bias', 'layers.23.mixer.out_proj.weight', 'layers.23.norm.weight', 'layers.24.mixer.A_log', 'layers.24.mixer.D', 'layers.24.mixer.A_b_log', 'layers.24.mixer.D_b', 'layers.24.mixer.in_proj.weight', 'layers.24.mixer.conv1d.weight', 'layers.24.mixer.conv1d.bias', 'layers.24.mixer.x_proj.weight', 'layers.24.mixer.dt_proj.weight', 'layers.24.mixer.dt_proj.bias', 'layers.24.mixer.conv1d_b.weight', 'layers.24.mixer.conv1d_b.bias', 'layers.24.mixer.x_proj_b.weight', 'layers.24.mixer.dt_proj_b.weight', 'layers.24.mixer.dt_proj_b.bias', 'layers.24.mixer.out_proj.weight', 'layers.24.norm.weight', 'layers.25.mixer.A_log', 'layers.25.mixer.D', 'layers.25.mixer.A_b_log', 'layers.25.mixer.D_b', 'layers.25.mixer.in_proj.weight', 'layers.25.mixer.conv1d.weight', 'layers.25.mixer.conv1d.bias', 'layers.25.mixer.x_proj.weight', 'layers.25.mixer.dt_proj.weight', 'layers.25.mixer.dt_proj.bias', 'layers.25.mixer.conv1d_b.weight', 'layers.25.mixer.conv1d_b.bias', 'layers.25.mixer.x_proj_b.weight', 'layers.25.mixer.dt_proj_b.weight', 'layers.25.mixer.dt_proj_b.bias', 'layers.25.mixer.out_proj.weight', 'layers.25.norm.weight', 'layers.26.mixer.A_log', 'layers.26.mixer.D', 'layers.26.mixer.A_b_log', 'layers.26.mixer.D_b', 'layers.26.mixer.in_proj.weight', 'layers.26.mixer.conv1d.weight', 'layers.26.mixer.conv1d.bias', 'layers.26.mixer.x_proj.weight', 'layers.26.mixer.dt_proj.weight', 'layers.26.mixer.dt_proj.bias', 'layers.26.mixer.conv1d_b.weight', 'layers.26.mixer.conv1d_b.bias', 'layers.26.mixer.x_proj_b.weight', 'layers.26.mixer.dt_proj_b.weight', 'layers.26.mixer.dt_proj_b.bias', 'layers.26.mixer.out_proj.weight', 'layers.26.norm.weight', 'layers.27.mixer.A_log', 'layers.27.mixer.D', 'layers.27.mixer.A_b_log', 'layers.27.mixer.D_b', 'layers.27.mixer.in_proj.weight', 'layers.27.mixer.conv1d.weight', 'layers.27.mixer.conv1d.bias', 'layers.27.mixer.x_proj.weight', 'layers.27.mixer.dt_proj.weight', 'layers.27.mixer.dt_proj.bias', 'layers.27.mixer.conv1d_b.weight', 'layers.27.mixer.conv1d_b.bias', 'layers.27.mixer.x_proj_b.weight', 'layers.27.mixer.dt_proj_b.weight', 'layers.27.mixer.dt_proj_b.bias', 'layers.27.mixer.out_proj.weight', 'layers.27.norm.weight', 'layers.28.mixer.A_log', 'layers.28.mixer.D', 'layers.28.mixer.A_b_log', 'layers.28.mixer.D_b', 'layers.28.mixer.in_proj.weight', 'layers.28.mixer.conv1d.weight', 'layers.28.mixer.conv1d.bias', 'layers.28.mixer.x_proj.weight', 'layers.28.mixer.dt_proj.weight', 'layers.28.mixer.dt_proj.bias', 'layers.28.mixer.conv1d_b.weight', 'layers.28.mixer.conv1d_b.bias', 'layers.28.mixer.x_proj_b.weight', 'layers.28.mixer.dt_proj_b.weight', 'layers.28.mixer.dt_proj_b.bias', 'layers.28.mixer.out_proj.weight', 'layers.28.norm.weight', 'layers.29.mixer.A_log', 'layers.29.mixer.D', 'layers.29.mixer.A_b_log', 'layers.29.mixer.D_b', 'layers.29.mixer.in_proj.weight', 'layers.29.mixer.conv1d.weight', 'layers.29.mixer.conv1d.bias', 'layers.29.mixer.x_proj.weight', 'layers.29.mixer.dt_proj.weight', 'layers.29.mixer.dt_proj.bias', 'layers.29.mixer.conv1d_b.weight', 'layers.29.mixer.conv1d_b.bias', 'layers.29.mixer.x_proj_b.weight', 'layers.29.mixer.dt_proj_b.weight', 'layers.29.mixer.dt_proj_b.bias', 'layers.29.mixer.out_proj.weight', 'layers.29.norm.weight', 'layers.30.mixer.A_log', 'layers.30.mixer.D', 'layers.30.mixer.A_b_log', 'layers.30.mixer.D_b', 'layers.30.mixer.in_proj.weight', 'layers.30.mixer.conv1d.weight', 'layers.30.mixer.conv1d.bias', 'layers.30.mixer.x_proj.weight', 'layers.30.mixer.dt_proj.weight', 'layers.30.mixer.dt_proj.bias', 'layers.30.mixer.conv1d_b.weight', 'layers.30.mixer.conv1d_b.bias', 'layers.30.mixer.x_proj_b.weight', 'layers.30.mixer.dt_proj_b.weight', 'layers.30.mixer.dt_proj_b.bias', 'layers.30.mixer.out_proj.weight', 'layers.30.norm.weight', 'layers.31.mixer.A_log', 'layers.31.mixer.D', 'layers.31.mixer.A_b_log', 'layers.31.mixer.D_b', 'layers.31.mixer.in_proj.weight', 'layers.31.mixer.conv1d.weight', 'layers.31.mixer.conv1d.bias', 'layers.31.mixer.x_proj.weight', 'layers.31.mixer.dt_proj.weight', 'layers.31.mixer.dt_proj.bias', 'layers.31.mixer.conv1d_b.weight', 'layers.31.mixer.conv1d_b.bias', 'layers.31.mixer.x_proj_b.weight', 'layers.31.mixer.dt_proj_b.weight', 'layers.31.mixer.dt_proj_b.bias', 'layers.31.mixer.out_proj.weight', 'layers.31.norm.weight', 'norm_f.weight'], unexpected_keys=['model'])

zhending111 commented 3 weeks ago

我的初始化时这样的self.video_mamba = videomamba_middle(pretrained=True,num_frames=5)