话说 syncnet_64mouth.pth和 syncnet_128mouth.pth有啥作用吗

MRzzm / DINet

The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."

999 stars 174 forks source link

话说 syncnet_64mouth.pth和 syncnet_128mouth.pth有啥作用吗 #101

Closed tailangjun closed 7 months ago

tailangjun commented 7 months ago

预训练的模型里有 syncnet_64mouth.pth和 syncnet_128mouth.pth，貌似没用地方用到。倒是 frame_64mouth.pth、frame_128mouth.pth是需要的，因为他的训练策略是 coarse-to-fine的，不知道我理解的对不对~

flysky126 commented 7 months ago

预训练的模型里有 syncnet_64mouth.pth和 syncnet_128mouth.pth，貌似没用地方用到。倒是 frame_64mouth.pth、frame_128mouth.pth是需要的，因为他的训练策略是 coarse-to-fine的，不知道我理解的对不对~

请问这个sync模型你替换后音频后，训练时也是用他输出的维度计算的吗？ loss大概能降到多少了？

tailangjun commented 7 months ago

预训练的模型里有 syncnet_64mouth.pth和 syncnet_128mouth.pth，貌似没用地方用到。倒是 frame_64mouth.pth、frame_128mouth.pth是需要的，因为他的训练策略是 coarse-to-fine的，不知道我理解的对不对~

请问这个sync模型你替换后音频后，训练时也是用他输出的维度计算的吗？ loss大概能降到多少了？

是的，用 MSE来算的，syncnet的 Loss_Sync可以降到0.24，clip的 Loss_perception可以降到2.05

flysky126 commented 7 months ago

预训练的模型里有 syncnet_64mouth.pth和 syncnet_128mouth.pth，貌似没用地方用到。倒是 frame_64mouth.pth、frame_128mouth.pth是需要的，因为他的训练策略是 coarse-to-fine的，不知道我理解的对不对~

请问这个sync模型你替换后音频后，训练时也是用他输出的维度计算的吗？ loss大概能降到多少了？

是的，用 MSE来算的，syncnet的 loss可以降到0.24，clip的 loss可以降到0.205

我也训了下，从0.4降到这个值，降的很快，不知道对不对

tailangjun commented 7 months ago

预训练的模型里有syncnet_64mouth.pth和syncnet_128mouth.pth，显然没用地方不用。倒是frame_64mouth.pth、frame_128mouth.pth是需要的，因为他的训练策略是从粗到细的，不知道我明白了的对不对~

请问这个同步模型你替换后音频后，训练时也是用他输出的维度计算的吗？损失大概能降到多少了？

是的，用MSE来算的，syncnet的loss可以降到0.24，clip的loss可以降到0.205

我也训了下，从0.4降到这个值，降的很快，不知道对不对

是的，开始降得很快，期待你的好结果