FuxiVirtualHuman / AAAI22-one-shot-talking-face

Code for One-shot Talking Face Generation from Single-speaker Audio-Visual Correlation Learning (AAAI 2022)
353 stars 61 forks source link

关于音素提取 #4

Open yaleimeng opened 2 years ago

yaleimeng commented 2 years ago

(1)Readme里面提到可以用其他ASR工具提取,然后映射到CMU phoneset. 能否以常见ASR工具为例,详细步骤具体一点说明?

(2)另外,如果声音是中文语音,还能用这个音素映射吗?

FuxiVirtualHuman commented 2 years ago

(1)Readme里面提到可以用其他ASR工具提取,然后映射到CMU phoneset. 能否以常见ASR工具为例,详细步骤具体一点说明?

(2)另外,如果声音是中文语音,还能用这个音素映射吗?

(1) Tools like Microsoft Azure and Google Cloud. (2) Different languages with different PhoneSet.

yaleimeng commented 2 years ago

@FuxiVirtualHuman Microsoft Azure 与 Google Cloud 似乎都是云服务,也并不开源。项目如果中有这样的依赖,对不能联网的脱机工作流程有很大影响。 希望贵团队对于中文语音方面的适配继续做一些拓展性的工作,推动整个研究领域共同进步。

ZayneHuang commented 1 year ago

Hi @FuxiVirtualHuman,

Thank you for your nice work!

Can Microsoft Azure extract the phoneset automatically? It seems that the speechsdk by Azure can now extract the IPA phoneme alphabet directly (see link). But I am still confused about how to map it to the CMU phoneset as you mentioned. Maybe I can use the mapping provided by this file?

Moreover, I found that the phindex.json in your repo has extra arpabets (NSN and SIL) comparing to the original CMU Dict. What's their difference and how can I get them from ASR tools?

Looking forward to your reply!