ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
https://ziqiaopeng.github.io/synctalk/
Other
1.07k stars 119 forks source link

中文音频特征 #88

Closed Tiandishihua closed 1 month ago

Tiandishihua commented 2 months ago

作者,您好!我使用康辉主持人的说话视频来测试你们的算法,发现效果很不好,主要原因在于中文音频特征的提取,我换成deepspeech效果还是不太行,请问有什么办法解决这种中文说话视频的数字人训练呢?非常期待您们的回复,谢谢!

wanghx1121 commented 2 months ago

遇到相同问题,麻烦作者解决一下或者提供一下思路。谢谢!~ @ZiqiaoPeng

jinqiupeter commented 2 months ago

Try Hubert. Here is my result using Hubert:

https://github.com/ZiqiaoPeng/SyncTalk/assets/12045814/e5429ca3-a31a-4e32-ac1f-cdccdc579b33

Tiandishihua commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

大佬,能加个微信讨论一下吗

jinqiupeter commented 2 months ago

Sure, jinqiuqiujin

G-force78 commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

impressive! Can you set up a fork with your settings??

wennjiee commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗,我使用ER-NeRF的代码得到hu.npy作为音频输入,在训练的过程中途,遇到如下问题: loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量,但是输出enc_a = self.audio_net(a)为nan,我对比了ER-NeRF此处的代码,没有发现不同的地方,不太清楚哪里出了问题。希望作者可以解答一下。 image @ZiqiaoPeng

yunyu commented 2 months ago

Try Hubert. Here is my result using Hubert: zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗,我使用ER-NeRF的代码得到hu.npy作为音频输入,在训练的过程中途,遇到如下问题: loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量,但是输出enc_a = self.audio_net(a)为nan,我对比了ER-NeRF此处的代码,没有发现不同的地方,不太清楚哪里出了问题。希望作者可以解答一下。 image @ZiqiaoPeng

Also getting this. I have already applied the clamp before exp in the density grid fix.

wennjiee commented 2 months ago

Try Hubert. Here is my result using Hubert: zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗,我使用ER-NeRF的代码得到hu.npy作为音频输入,在训练的过程中途,遇到如下问题: loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量,但是输出enc_a = self.audio_net(a)为nan,我对比了ER-NeRF此处的代码,没有发现不同的地方,不太清楚哪里出了问题。希望作者可以解答一下。

Also getting this. I have already applied the clamp before exp in the density grid fix.

Do you mean apply torch.exp(torch.clamp()) in network.py ?
image

qwert1887 commented 1 month ago

@wennjiee Hello, what's the value range of min&max in your work,[-50,50],[-9,9] other else?

ZiqiaoPeng commented 1 month ago

loss为nan的问题已经修复,同时可以使用hubert进行训练。