TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
Other
2.51k stars 306 forks source link

开源的唇语驱动模型,是从随机初始化开始训练的,还是先对Unet网络结构进行预训练后再训练唇语驱动模型呢? #95

Closed gobigrassland closed 4 months ago

gobigrassland commented 4 months ago

我看到用到的Unet模型参数与SD1.4模型配置参数,就是其中cross_attention_dim和in_channels的区别。 (1)唇语模型UNet: cross_attention_dim=384, in_channels=8 (2)SD1.4 UNet: cross_attention_dim=768, in_channels=4

czk32611 commented 4 months ago

是从随机初始化开始训练的