Loading check point - Githubissues

Thanks for the workaround. I've tried it and it worked for the default model, however for the model you mentioned (which I downloaded from https://huggingface.co/lukewys/laion_clap/resolve/main/music_audioset_epoch_15_esc_90.14.pt) I am getting the following errors (size mismatch and missing keys)

Missing key(s) in state_dict: "audio_branch.patch_embed.mel_conv2d.weight", "audio_branch.patch_embed.mel_conv2d.bias", "audio_branch.patch_embed.fusion_model.local_att.0.weight", "audio_branch.patch_embed.fusion_model.local_att.0.bias", "audio_branch.patch_embed.fusion_model.local_att.1.weight", "audio_branch.patch_embed.fusion_model.local_att.1.bias", "audio_branch.patch_embed.fusion_model.local_att.1.running_mean", "audio_branch.patch_embed.fusion_model.local_att.1.running_var", "audio_branch.patch_embed.fusion_model.local_att.3.weight", "audio_branch.patch_embed.fusion_model.local_att.3.bias", "audio_branch.patch_embed.fusion_model.local_att.4.weight", "audio_branch.patch_embed.fusion_model.local_att.4.bias", "audio_branch.patch_embed.fusion_model.local_att.4.running_mean", "audio_branch.patch_embed.fusion_model.local_att.4.running_var", "audio_branch.patch_embed.fusion_model.global_att.1.weight", "audio_branch.patch_embed.fusion_model.global_att.1.bias", "audio_branch.patch_embed.fusion_model.global_att.2.weight", "audio_branch.patch_embed.fusion_model.global_att.2.bias", "audio_branch.patch_embed.fusion_model.global_att.2.running_mean", "audio_branch.patch_embed.fusion_model.global_att.2.running_var", "audio_branch.patch_embed.fusion_model.global_att.4.weight", "audio_branch.patch_embed.fusion_model.global_att.4.bias", "audio_branch.patch_embed.fusion_model.global_att.5.weight", "audio_branch.patch_embed.fusion_model.global_att.5.bias", "audio_branch.patch_embed.fusion_model.global_att.5.running_mean", "audio_branch.patch_embed.fusion_model.global_att.5.running_var". 
        Unexpected key(s) in state_dict: "audio_branch.layers.2.blocks.6.norm1.weight", "audio_branch.layers.2.blocks.6.norm1.bias", "audio_branch.layers.2.blocks.6.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.6.attn.relative_position_index", "audio_branch.layers.2.blocks.6.attn.qkv.weight", "audio_branch.layers.2.blocks.6.attn.qkv.bias", "audio_branch.layers.2.blocks.6.attn.proj.weight", "audio_branch.layers.2.blocks.6.attn.proj.bias", "audio_branch.layers.2.blocks.6.norm2.weight", "audio_branch.layers.2.blocks.6.norm2.bias", "audio_branch.layers.2.blocks.6.mlp.fc1.weight", "audio_branch.layers.2.blocks.6.mlp.fc1.bias", "audio_branch.layers.2.blocks.6.mlp.fc2.weight", "audio_branch.layers.2.blocks.6.mlp.fc2.bias", "audio_branch.layers.2.blocks.7.attn_mask", "audio_branch.layers.2.blocks.7.norm1.weight", "audio_branch.layers.2.blocks.7.norm1.bias", "audio_branch.layers.2.blocks.7.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.7.attn.relative_position_index", "audio_branch.layers.2.blocks.7.attn.qkv.weight", "audio_branch.layers.2.blocks.7.attn.qkv.bias", "audio_branch.layers.2.blocks.7.attn.proj.weight", "audio_branch.layers.2.blocks.7.attn.proj.bias", "audio_branch.layers.2.blocks.7.norm2.weight", "audio_branch.layers.2.blocks.7.norm2.bias", "audio_branch.layers.2.blocks.7.mlp.fc1.weight", "audio_branch.layers.2.blocks.7.mlp.fc1.bias", "audio_branch.layers.2.blocks.7.mlp.fc2.weight", "audio_branch.layers.2.blocks.7.mlp.fc2.bias", "audio_branch.layers.2.blocks.8.norm1.weight", "audio_branch.layers.2.blocks.8.norm1.bias", "audio_branch.layers.2.blocks.8.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.8.attn.relative_position_index", "audio_branch.layers.2.blocks.8.attn.qkv.weight", "audio_branch.layers.2.blocks.8.attn.qkv.bias", "audio_branch.layers.2.blocks.8.attn.proj.weight", "audio_branch.layers.2.blocks.8.attn.proj.bias", "audio_branch.layers.2.blocks.8.norm2.weight", "audio_branch.layers.2.blocks.8.norm2.bias", "audio_branch.layers.2.blocks.8.mlp.fc1.weight", "audio_branch.layers.2.blocks.8.mlp.fc1.bias", "audio_branch.layers.2.blocks.8.mlp.fc2.weight", "audio_branch.layers.2.blocks.8.mlp.fc2.bias", "audio_branch.layers.2.blocks.9.attn_mask", "audio_branch.layers.2.blocks.9.norm1.weight", "audio_branch.layers.2.blocks.9.norm1.bias", "audio_branch.layers.2.blocks.9.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.9.attn.relative_position_index", "audio_branch.layers.2.blocks.9.attn.qkv.weight", "audio_branch.layers.2.blocks.9.attn.qkv.bias", "audio_branch.layers.2.blocks.9.attn.proj.weight", "audio_branch.layers.2.blocks.9.attn.proj.bias", "audio_branch.layers.2.blocks.9.norm2.weight", "audio_branch.layers.2.blocks.9.norm2.bias", "audio_branch.layers.2.blocks.9.mlp.fc1.weight", "audio_branch.layers.2.blocks.9.mlp.fc1.bias", "audio_branch.layers.2.blocks.9.mlp.fc2.weight", "audio_branch.layers.2.blocks.9.mlp.fc2.bias", "audio_branch.layers.2.blocks.10.norm1.weight", "audio_branch.layers.2.blocks.10.norm1.bias", "audio_branch.layers.2.blocks.10.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.10.attn.relative_position_index", "audio_branch.layers.2.blocks.10.attn.qkv.weight", "audio_branch.layers.2.blocks.10.attn.qkv.bias", "audio_branch.layers.2.blocks.10.attn.proj.weight", "audio_branch.layers.2.blocks.10.attn.proj.bias", "audio_branch.layers.2.blocks.10.norm2.weight", "audio_branch.layers.2.blocks.10.norm2.bias", "audio_branch.layers.2.blocks.10.mlp.fc1.weight", "audio_branch.layers.2.blocks.10.mlp.fc1.bias", "audio_branch.layers.2.blocks.10.mlp.fc2.weight", "audio_branch.layers.2.blocks.10.mlp.fc2.bias", "audio_branch.layers.2.blocks.11.attn_mask", "audio_branch.layers.2.blocks.11.norm1.weight", "audio_branch.layers.2.blocks.11.norm1.bias", "audio_branch.layers.2.blocks.11.attn.relative_position_bias_table", "audio_branch.layers.2.blocks.11.attn.relative_position_index", "audio_branch.layers.2.blocks.11.attn.qkv.weight", "audio_branch.layers.2.blocks.11.attn.qkv.bias", "audio_branch.layers.2.blocks.11.attn.proj.weight", "audio_branch.layers.2.blocks.11.attn.proj.bias", "audio_branch.layers.2.blocks.11.norm2.weight", "audio_branch.layers.2.blocks.11.norm2.bias", "audio_branch.layers.2.blocks.11.mlp.fc1.weight", "audio_branch.layers.2.blocks.11.mlp.fc1.bias", "audio_branch.layers.2.blocks.11.mlp.fc2.weight", "audio_branch.layers.2.blocks.11.mlp.fc2.bias". 
        size mismatch for audio_branch.patch_embed.proj.weight: copying a param with shape torch.Size([128, 1, 4, 4]) from checkpoint, the shape in current model is torch.Size([96, 1, 4, 4]).
        size mismatch for audio_branch.patch_embed.proj.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.patch_embed.norm.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.patch_embed.norm.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.attn.qkv.weight: copying a param with shape torch.Size([384, 128]) from checkpoint, the shape in current model is torch.Size([288, 96]).
        size mismatch for audio_branch.layers.0.blocks.0.attn.qkv.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for audio_branch.layers.0.blocks.0.attn.proj.weight: copying a param with shape torch.Size([128, 128]) from checkpoint, the shape in current model is torch.Size([96, 96]).
        size mismatch for audio_branch.layers.0.blocks.0.attn.proj.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]).
        size mismatch for audio_branch.layers.0.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.0.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]).
        size mismatch for audio_branch.layers.0.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.norm1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.norm1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.attn.qkv.weight: copying a param with shape torch.Size([384, 128]) from checkpoint, the shape in current model is torch.Size([288, 96]).
        size mismatch for audio_branch.layers.0.blocks.1.attn.qkv.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for audio_branch.layers.0.blocks.1.attn.proj.weight: copying a param with shape torch.Size([128, 128]) from checkpoint, the shape in current model is torch.Size([96, 96]).
        size mismatch for audio_branch.layers.0.blocks.1.attn.proj.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.norm2.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.norm2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([512, 128]) from checkpoint, the shape in current model is torch.Size([384, 96]).
        size mismatch for audio_branch.layers.0.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.0.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([128, 512]) from checkpoint, the shape in current model is torch.Size([96, 384]).
        size mismatch for audio_branch.layers.0.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([96]).
        size mismatch for audio_branch.layers.0.downsample.reduction.weight: copying a param with shape torch.Size([256, 512]) from checkpoint, the shape in current model is torch.Size([192, 384]).
        size mismatch for audio_branch.layers.0.downsample.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.0.downsample.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.1.blocks.0.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.0.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.0.attn.qkv.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([576, 192]).
        size mismatch for audio_branch.layers.1.blocks.0.attn.qkv.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([576]).
        size mismatch for audio_branch.layers.1.blocks.0.attn.proj.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([192, 192]).
        size mismatch for audio_branch.layers.1.blocks.0.attn.proj.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.0.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.0.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]).
        size mismatch for audio_branch.layers.1.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.1.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]).
        size mismatch for audio_branch.layers.1.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.norm1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.norm1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.attn.qkv.weight: copying a param with shape torch.Size([768, 256]) from checkpoint, the shape in current model is torch.Size([576, 192]).
        size mismatch for audio_branch.layers.1.blocks.1.attn.qkv.bias: copying a param with shape torch.Size([768]) from checkpoint, the shape in current model is torch.Size([576]).
        size mismatch for audio_branch.layers.1.blocks.1.attn.proj.weight: copying a param with shape torch.Size([256, 256]) from checkpoint, the shape in current model is torch.Size([192, 192]).
        size mismatch for audio_branch.layers.1.blocks.1.attn.proj.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.norm2.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.norm2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([1024, 256]) from checkpoint, the shape in current model is torch.Size([768, 192]).
        size mismatch for audio_branch.layers.1.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.1.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([256, 1024]) from checkpoint, the shape in current model is torch.Size([192, 768]).
        size mismatch for audio_branch.layers.1.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([192]).
        size mismatch for audio_branch.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([384, 768]).
        size mismatch for audio_branch.layers.1.downsample.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.1.downsample.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.2.blocks.0.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.0.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.0.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.0.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.0.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.0.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.0.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.0.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.1.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.1.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.1.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.2.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.2.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.2.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.2.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.2.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.2.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.2.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.3.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.3.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.3.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.3.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.3.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.3.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.3.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.4.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.4.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.4.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.4.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.4.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.4.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.4.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.attn.qkv.weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([1152, 384]).
        size mismatch for audio_branch.layers.2.blocks.5.attn.qkv.bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([1152]).
        size mismatch for audio_branch.layers.2.blocks.5.attn.proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([384, 384]).
        size mismatch for audio_branch.layers.2.blocks.5.attn.proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.blocks.5.mlp.fc1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([1536, 384]).
        size mismatch for audio_branch.layers.2.blocks.5.mlp.fc1.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.blocks.5.mlp.fc2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([384, 1536]).
        size mismatch for audio_branch.layers.2.blocks.5.mlp.fc2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([384]).
        size mismatch for audio_branch.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([768, 1536]).
        size mismatch for audio_branch.layers.2.downsample.norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.2.downsample.norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1536]).
        size mismatch for audio_branch.layers.3.blocks.0.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.0.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.0.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
        size mismatch for audio_branch.layers.3.blocks.0.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]).
        size mismatch for audio_branch.layers.3.blocks.0.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for audio_branch.layers.3.blocks.0.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.0.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.0.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.0.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for audio_branch.layers.3.blocks.0.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for audio_branch.layers.3.blocks.0.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for audio_branch.layers.3.blocks.0.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.norm1.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.norm1.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.attn.qkv.weight: copying a param with shape torch.Size([3072, 1024]) from checkpoint, the shape in current model is torch.Size([2304, 768]).
        size mismatch for audio_branch.layers.3.blocks.1.attn.qkv.bias: copying a param with shape torch.Size([3072]) from checkpoint, the shape in current model is torch.Size([2304]).
        size mismatch for audio_branch.layers.3.blocks.1.attn.proj.weight: copying a param with shape torch.Size([1024, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
        size mismatch for audio_branch.layers.3.blocks.1.attn.proj.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.norm2.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.norm2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.layers.3.blocks.1.mlp.fc1.weight: copying a param with shape torch.Size([4096, 1024]) from checkpoint, the shape in current model is torch.Size([3072, 768]).
        size mismatch for audio_branch.layers.3.blocks.1.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([3072]).
        size mismatch for audio_branch.layers.3.blocks.1.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([768, 3072]).
        size mismatch for audio_branch.layers.3.blocks.1.mlp.fc2.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
        size mismatch for audio_branch.tscam_conv.weight: copying a param with shape torch.Size([527, 1024, 2, 3]) from checkpoint, the shape in current model is torch.Size([527, 768, 2, 3]).
        size mismatch for audio_projection.0.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([512, 768]).

My code is simply

from src import laion_clap

model = laion_clap.CLAP_Module(enable_fusion=True)
model.load_ckpt('music_audioset_epoch_15_esc_90.14.pt')

Could you share how you tested @waldleitner?

Thanks

LAION-AI / CLAP

Loading check point #113