Hi,
Thanks for making the code and the models publicly available. I was wondering if there is any way we can create a multilingual version of CLAP also instead of using Swin-transformer as the audio encoder, can we swap it with other multilingual encoders like Whisper?
Hi, Thanks for making the code and the models publicly available. I was wondering if there is any way we can create a multilingual version of CLAP also instead of using Swin-transformer as the audio encoder, can we swap it with other multilingual encoders like Whisper?