MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
MIT License
4.27k stars 713 forks source link

Inference with unseen songs #30

Closed Charlottecuc closed 2 years ago

Charlottecuc commented 2 years ago

Hi. Since the DiffSinger(PopCS) needs ground-truth f0 information at inference, is it possible to synthesize an unseen song (with phoneme labels, phoneme duration and notes provided) using the DIffSinger(PopCS) model?

Charlottecuc commented 2 years ago

For example, how to inference the DiffSinger model (trainined with the PopCS dataset) with an opencpop song? Is it possible to extract f0 from the opencpop song wav as f0 input, to the DiffSinger(PopCS) at inference? (although I think the extracted gt f0 information is to some extent mixed with the source speaker information and will fail)

MoonInTheRiver commented 2 years ago

Why don't you use https://github.com/MoonInTheRiver/DiffSinger/blob/master/usr/configs/midi/readme.md ? It includes the checkpoints trained on opencpop and it does not use gt f0.

Charlottecuc commented 2 years ago

No. What I mean is, the DiffSinger (PopCS version), cannot inference with unseen songs, right?

MoonInTheRiver commented 2 years ago

The songs in the test set of PopCS are the unseen (no overlap with training data) songs.

Charlottecuc commented 2 years ago

The songs in the test set of PopCS are the unseen (no overlap with training data) songs.

我的意思是,因为现在PopCS测试集里的歌和训练集是同一个说话人,所以inference的时候,提取的gt f0也是耦合了这个说话人的speaker信息的。目前来看,DiffSinger(PopCS)是不是只能用PopCS这个说话人唱过的歌来测试?(否则用别的人唱的歌的wav提的gt f0和PopCS说话人自身的基频不匹配)。举个例子,比如说如果想用PopCS的音色合成数据集外的《祝你生日快乐》,是不是无法做到? @MoonInTheRiver 感谢

Charlottecuc commented 2 years ago

Solved.

Opdoop commented 2 years ago

@Charlottecuc Em... 所以 inference 可以用不同说话人的样本吗?测试效果如何?

lixucuhk commented 1 year ago

@Charlottecuc I just met this problem as well. I want to use the diffsinger model, trained on the PopCS, to generate a song given the lyrics information provided in the OpenCPoP. Could you please tell me how did you achieve that? Many thanks!!

lvZic commented 1 year ago

any solution when inferencing with unseen singer ? @lixucuhk @RayeRen @Opdoop @Charlottecuc @MoonInTheRiver