LAION-AI / CLAP

Contrastive Language-Audio Pretraining
https://arxiv.org/abs/2211.06687
Creative Commons Zero v1.0 Universal
1.43k stars 137 forks source link

3 larger CLAP models not able to load #153

Open MultiTrickFox opened 5 months ago

MultiTrickFox commented 5 months ago

Hello, my approach is pretty straightforward; running the default code with just ckpts changed:

model = laion_clap.CLAP_Module(enable_fusion=False, amodel= 'HTSAT-base')
model.load_ckpt( 'music_audioset_epoch_15_esc_90.14.pt' )

But I get the error:

Error(s) in loading state_dict for CLAP:
    Unexpected key(s) in state_dict: "text_branch.embeddings.position_ids". 

This error is present in all the newly trained models; For music: music_audioset_epoch_15_esc_90.14.pt For music and speech: music_speech_epoch_15_esc_89.25.pt For speech, music and general audio: music_speech_audioset_epoch_15_esc_89.98.pt

How can I correctly load these models? thanks..

tbrouns commented 3 weeks ago

See: https://github.com/LAION-AI/CLAP/issues/127#issuecomment-1736948673