Closed corranmac closed 1 year ago
@corranmac Got it! I have updated the readme. See example.py
Thanks for such a quick response!
I was wondering if I'm able to transform these outputed embeddings to the same shape as clip, for use in speech-image retrieval and also image generation trained on clip embeddings? I can't seem to find a seperate class for encoding, eg. model.encode() like clip has.
Thanks
@corranmac Yeah, you can use the semantic embedding of speech to calculate similarity with image embeddings for speech-image retrieval. In fact, this is how we do it in our paper. I have added a function in the model class (kwClip.py) for extracting the semantic embedding for speech input https://github.com/atosystem/SpeechCLIP/blob/e2a572d905e8b9d9365ccb8a44dca5c13a60744a/avssl/model/kwClip.py#L1299-L1315
@corranmac If there is no further question, I will close this issue.
Hi,
Please could you provide a simple way to load a model and test a single audio clip to produce an embedding?
Thank you very much.