Why feed in f0 in the midi version

MoonInTheRiver / DiffSinger

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code

MIT License

4.25k stars 712 forks source link

Why feed in f0 in the midi version #24

Closed zhangsanfeng86 closed 2 years ago

zhangsanfeng86 commented 2 years ago

Hi @MoonInTheRiver ，

In the midi version, why also feed in f0 and uv?

f0 and uv is generated from raw wav, but during the infer, only txt_token and midi are given, how to get f0 and uv?

MoonInTheRiver commented 2 years ago

I guess you did not read my readme.md: "c) in this version of codes, we used the melody frontend ([lyric + MIDI]->[F0]) to predict F0 contour" we use txt_token and midi to predict f0 and uv.

MoonInTheRiver commented 2 years ago

zhangsanfeng86 commented 2 years ago

Thank you for your quick reply! And sorry for my mistake:)

Another question is that: both "mel2ph" and "midi_dur" are considered to be the duration information. Why both of them feed in , but not only one of them?

MoonInTheRiver commented 2 years ago

You can try to remove midi_dur. I just use all the information provided by opencpop team.

zhangsanfeng86 commented 2 years ago