Open happylittlecat2333 opened 1 year ago
Hi!
Thank you for your question. Unfortunately, the Hugging Face implementation is not handled by us (authors in CLAP) but by Hugging Face researchers (I think they are Younes Belkada and Arthur Zucker).
It would be better if you could open this issue under the Hugging Face transformers repo. Of course, I believe our pip library could have the same function. So if your code is not largely base on the Hugging Face transformers, you are welcome to use our pip library (see readme for more details).
Thanks for your reply! Because my work is largely depend on huggingface style model, like laion/clap-htsat-unfused,so it would be convinent to change to another clap model like music-clap with same code. I will open issue under huggingface transformers repo. Thanks a lot!
PS: since there is open huggingface model like laion/clap-htsat-unfused under Laion, I think it will be very convenient for users if your team convert other clap model into huggingface style and upload to Laion, just like the CLIP models collection under Laion. :)
Question Description
I want to use huggingface model style but only find "laion/clap-htsat-unfused" and "laion/clap-htsat-fused" in huggingface Models. However, I wish to use the music CLAP model, such as music_speech_epoch_15_esc_89.25.pt, so I find https://github.com/huggingface/transformers/blob/main/src/transformers/models/clap/convert_clap_original_pytorch_to_hf.py to convert the clap model. But I find that the newly update model are based on
HTSAT-base
model, thehidden_size
andpatch_embeds_hidden_size
are different. So I revise theconvert_clap_original_pytorch_to_hf.py
to below. But after test three model( includingHTSAT-base
andHTSAT-tiny
based model), I find Acc drop forHTSAT-base
model, can you please help me find out the problem, and maybe convert and upload huggingface model style of your newly updated model.My revised
convert_clap_original_pytorch_to_hf.py
convert script:
My evalute on ESC50 (adopted by your eval code)
Evaluate Result
630k-audioset-best (before convert, HTSAT-tiny type)
630k-audioset-best (after convert, HTSAT-tiny type)
music_audioset_epoch_15_esc_90.14 (before convert, HTSAT-base type)
music_audioset_epoch_15_esc_90.14 (after convert, HTSAT-base type)
music_speech_audioset_epoch_15_esc_89.98 (before convert, HTSAT-base type)
music_speech_audioset_epoch_15_esc_89.98 (after convert, HTSAT-base type)
Therefore, we can see that
HTSAT-base
type have Acc drop after converting to huggingface type, could you please help us figure out this bug, and maybe upload huggingface version of CLAP model for music_speech_epoch_15_esc_89.25.pt, music_speech_audioset_epoch_15_esc_89.98.pt? Thanks!