atosystem / SpeechCLIP

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022
https://atosystem.github.io/blogs/speechclip
BSD 3-Clause "New" or "Revised" License
108 stars 6 forks source link

about training codes #5

Closed Benjizhang closed 11 months ago

Benjizhang commented 1 year ago

非常感谢你的这个具有开创性和启发性的工作。我希望可以follow and replay your work. 尤其是关于如何去train parallelSpeechCLIP的部分。但是我并没有在repository中找到。

请问,能不能大概提供一下training codes on how to call the proposed speech encoder and image encoder of CLIP,之后又是如何计算contrastive loss, followed by how to backward propagation. 或者提供一些reference codes/blogs about training codes.

非常感谢!

Benjizhang commented 11 months ago

I have completed this project already. Thanks for your works as well.