bfs18 / nsynth_wavenet

parallel wavenet based on nsynth
106 stars 30 forks source link

Three different points compare to the paper #20

Open wangrui5781 opened 6 years ago

wangrui5781 commented 6 years ago

Hi, thank you for sharing this code and I find some differences comparing to wavenet paper. 1.Why you discard the skip connections in parallel wavenet which is used in wavenet?

  1. I find a local condition convoluted by mel between iaf and the last relu layer. What does it mean? 3.Parallel wavenet generate output x by mu-tot and s-tot , contrast to Clarinet, which regard the n-th sample z as output. What do you think about it?
zhang-jian commented 6 years ago

I am not the author of this code, but this is my understanding.

  1. Why you discard the skip connections in parallel wavenet which is used in wavenet? "The student network consisted of the same WaveNet architecture layout, except with different inputs and outputs and no skip connections. " Parallel WaveNet: Fast High-Fidelity Speech Synthesis

  2. I find a local condition convoluted by mel between iaf and the last relu layer. What does it mean? This is just a different implementation. I don't know whether this is important or not (I think it is not) .

  3. Parallel wavenet generate output x by mu-tot and s-tot , contrast to Clarinet, which regard the n-th sample z as output. What do you think about it? My understand of this is that Deepmind parallel Wavenet needs significant amount of sampling to compute KL loss which makes sense that a student sample is sampled from mu_tot and s_tot. Clarinet computes KL loss in a closed-form. Then, outputs of the last IAF flow can be used as student samples.

wangrui5781 commented 6 years ago

Thanks for your reply. I have corrected my mistake according to your answer. I read the Clarinet before parallel Wavenet so I don't take notice of the differences. Aha, that is not "严谨治学". However, both two models generate noisy voices, at least worse than the teacher. By STFT, I find that it can not learn high frequency distribution. Any idea to improve the model?