NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 184 forks source link

correct f0 frame_length error #23

Open JeffpanUK opened 4 years ago

JeffpanUK commented 4 years ago

The parameter "frame_length" in compute_f0 should be the "win_length" rather than the "filter_length". When filter_length > win_length, the f0 computation will be incorrect. As in mel computation, we first get the window with win_length and then pad it to filter_length. In F0 computation, we should as well compute F0 within a win_length frame rather than filter_length frame.

candlewill commented 4 years ago

Yes, I think you're right.