fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

How to split mel-spectrum in Multi-band wavernn? #219

Closed Shijie-Liu007 closed 3 years ago

Shijie-Liu007 commented 3 years ago

Hey all!

I want to realize multi-band wavernn based on fatchord version. During the training process, I split the audio samples to 4 subbands by using an analysis filter, but how to split mel-spectrum so that it corresponds to audio subbands? Can I divide it into four parts in direct order? Opinions and ideas about multi-band wavernn would be greatly appreciated!

Reference: DurIAN: Duration Informed Attention Network For Multimodal Synthesis https://arxiv.org/abs/1909.01700#:~:text=The%20proposed%20Multiband%20WaveRNN%20effectively%20reduces%20the%20total,end-to-end%20systems%2C%20while%20at%20the%20same%20time%20

maozhiqiang commented 3 years ago

@Shijie-Liu007 我觉得 mel 不需要进行分割吧,以前是一帧对应 256个音频点,变成4band后,一帧对应64个点就可以了,这是我的理解,mel谱还是以前的输入,只不过上采样的幅度变成原来的1/4而已

Shijie-Liu007 commented 3 years ago

@Shijie-Liu007 我觉得 mel 不需要进行分割吧,以前是一帧对应 256个音频点,变成4band后,一帧对应64个点就可以了,这是我的理解,mel谱还是以前的输入,只不过上采样的幅度变成原来的1/4而已

非常感谢回复!我先按照您的说法做尝试,再次感谢!