中文音频特征 - Githubissues

ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

https://ziqiaopeng.github.io/synctalk/

Other

1.07k stars 119 forks source link

中文音频特征 #88

Closed Tiandishihua closed 1 month ago

Tiandishihua commented 2 months ago

作者，您好！我使用康辉主持人的说话视频来测试你们的算法，发现效果很不好，主要原因在于中文音频特征的提取，我换成deepspeech效果还是不太行，请问有什么办法解决这种中文说话视频的数字人训练呢？非常期待您们的回复，谢谢！

wanghx1121 commented 2 months ago

遇到相同问题，麻烦作者解决一下或者提供一下思路。谢谢！~ @ZiqiaoPeng

jinqiupeter commented 2 months ago

Try Hubert. Here is my result using Hubert:

https://github.com/ZiqiaoPeng/SyncTalk/assets/12045814/e5429ca3-a31a-4e32-ac1f-cdccdc579b33

Tiandishihua commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

大佬，能加个微信讨论一下吗

jinqiupeter commented 2 months ago

Sure, jinqiuqiujin

G-force78 commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

impressive! Can you set up a fork with your settings??

wennjiee commented 2 months ago

Try Hubert. Here is my result using Hubert:

zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗，我使用ER-NeRF的代码得到hu.npy作为音频输入，在训练的过程中途，遇到如下问题： loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量，但是输出enc_a = self.audio_net(a)为nan，我对比了ER-NeRF此处的代码，没有发现不同的地方，不太清楚哪里出了问题。希望作者可以解答一下。 @ZiqiaoPeng

yunyu commented 2 months ago

Try Hubert. Here is my result using Hubert: zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗，我使用ER-NeRF的代码得到hu.npy作为音频输入，在训练的过程中途，遇到如下问题： loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量，但是输出enc_a = self.audio_net(a)为nan，我对比了ER-NeRF此处的代码，没有发现不同的地方，不太清楚哪里出了问题。希望作者可以解答一下。 @ZiqiaoPeng

Also getting this. I have already applied the clamp before exp in the density grid fix.

wennjiee commented 2 months ago

Try Hubert. Here is my result using Hubert: zuck_hubert.mp4

请问下您使用hubert训练遇到了nan问题吗，我使用ER-NeRF的代码得到hu.npy作为音频输入，在训练的过程中途，遇到如下问题： loss=nan (nan), lr=0.000647: : 1% 38/4186 [00:07<10:00, 6.91it/s]NaN or Inf found in input tensor. 我通过debug发现此处代码输入a是正常向量，但是输出enc_a = self.audio_net(a)为nan，我对比了ER-NeRF此处的代码，没有发现不同的地方，不太清楚哪里出了问题。希望作者可以解答一下。

Also getting this. I have already applied the clamp before exp in the density grid fix.

Do you mean apply torch.exp(torch.clamp()) in network.py ?

qwert1887 commented 1 month ago

@wennjiee Hello, what's the value range of min&max in your work，[-50,50],[-9,9] other else?

ZiqiaoPeng commented 1 month ago

loss为nan的问题已经修复，同时可以使用hubert进行训练。