BrasD99 / HeyGenClone

A simple and open-source analogue of the HeyGen system
862 stars 172 forks source link

中文的支持 #1

Closed wanghao-007 closed 10 months ago

wanghao-007 commented 10 months ago

你好,达瓦里希。对于中文的翻译,目前有没有相关的解决方案,方便的话你可以透露一下,我想实现一下啊。谢谢

BrasD99 commented 10 months ago

Hello @wanghao-007! Glad to see your interest in my solution.

I think it's possible to try Chinese translation right now. The main point to consider: transcription of audio from the input video is obtained through Whisper, so far there is only support for English. Then HeyGenClone translates the transcription into the required language using googletrans. Voice cloning is performed using the Coqui solution, where Chinese language support is declared. From the point of view of lip sync, I don't see any problems.

But I want to note that my solution is still at the stage of active development. There are still a lot of things I want to fix and improve. I have not tested the Chinese translation yet, but I will be glad to receive feedback on this.

And, of course, I will be especially glad to see suggestions for improvement. I need contributors with new ideas.

wanghao-007 commented 10 months ago

Hello @wanghao-007! Glad to see your interest in my solution.

I think it's possible to try Chinese translation right now. The main point to consider: transcription of audio from the input video is obtained through Whisper, so far there is only support for English. Then HeyGenClone translates the transcription into the required language using googletrans. Voice cloning is performed using the Coqui solution, where Chinese language support is declared. From the point of view of lip sync, I don't see any problems.

But I want to note that my solution is still at the stage of active development. There are still a lot of things I want to fix and improve. I have not tested the Chinese translation yet, but I will be glad to receive feedback on this.

And, of course, I will be especially glad to see suggestions for improvement. I need contributors with new ideas.

谢谢你的回复。你的这个项目很复杂,涉及到很多深度学习方向。我复现了这个项目后跑了几个视频,我发现一个小问题,对说话者的面部检测后做完嘴形变化再粘回去会出现一些视觉错位和贴图的叠加现象,不是很自然。另外如果使用在线翻译的话,可以使用openai的一些api接口,会不会更快一些?中文的话,我还没有办法,主要在中文嘴形怎么匹配我还比较陌生,我需要再看看你的项目细节。

BrasD99 commented 10 months ago

@wanghao-007 Of course, I can use OpenAI, but you need to understand that it is not available in Russia. I don't see the point of using a VPN, it slows down development. At the moment, I will not touch the translation functionality.

As for lip sync: I have already encountered this problem and am thinking about solving it. Accuracy is lame - this is a fact.