音频特征提取使用whisper_tiny是为什么呢？

antgroup / echomimic

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

https://antgroup.github.io/ai/echomimic/

Apache License 2.0

3.01k stars 349 forks source link

Open tszssong opened 3 days ago

tszssong commented 3 days ago

您好，看论文3.2提到音频特征使用wav2vec，但代码仓库使用的是whisper_tiny，是处于什么考虑呢，这个模型效果更好吗？