合成的嘴形抖动有些频繁

ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

MIT License

878 stars 153 forks source link

合成的嘴形抖动有些频繁 #30

Open yerfor opened 1 year ago

yerfor commented 1 year ago

你好，非常感谢你开源的优秀工作！

我按照readme中的介绍从头训练了trial_obama_eo，但是并使用intro_eo.npy来test。但是发现一些问题：

生成的结果中说话人的嘴形存在细小但频繁的波动，直观来看有点抽抽。不知道是不是aud_att_net没有被正确使用导致的呢？
在视频中有几帧我发现嘴部存在镂空的情况，有点类似于CG里面的穿模。不知道是不是对alpha的regularization导致的？

期待你的回复！

https://user-images.githubusercontent.com/48365204/222040061-e4471bc9-eb4c-49de-91b9-5f3445e1fee7.mp4

aishoot commented 1 year ago

+1 我也对奥巴马的视频进行了重训，训练的模型测试同样有嘴部有轻微抖动的情况。作者开源的模型就没有这个问题。请问我需要在哪里微调下呢？

Gpwner commented 1 year ago

会不会是提取音频特征的时候FPS没指定为25的原因... https://github.com/ashawkey/RAD-NeRF/issues/32

ashawkey commented 1 year ago

@aishoot Hi, this is strange, what does the self-driven testing video generated during training look like? Flickering could be manually smoothed with smooth_lips (which is default to true in testing but false in training). You can adjust the smoothness strength here if there's no better way.

aishoot commented 1 year ago

@aishoot Hi, this is strange, what does the self-driven testing video generated during training look like? Flickering could be manually smoothed with smooth_lips (which is default to true in testing but false in training). You can adjust the smoothness strength here if there's no better way.

Thanks for your reply. I'll try it.

baijiesong commented 11 months ago

请问只用中文音频驱动的视频导致嘴部抖动可以调这个参数解决吗