austinmoehle / wavernn

WaveRNN-based waveform generator & demo of TensorFlow CuDNN-GRU usage.
23 stars 8 forks source link

Questions about model structure #3

Open npuichigo opened 6 years ago

npuichigo commented 6 years ago

I used tensorboard to inspect your model structure and found that the pb model you provided just uses one softmax with 256 outputs (8 bits).

image

However, the paper uses two separated DNNs to predict the coarse and fine part of a sample. Is that because your model reuse the matrix of O1 and O3 (O2 and O4) or you just support 8 bits with mu-law compression?

image

MlWoo commented 6 years ago

@npuichigo I also have rewritten the training code by the graph. It does work and the audio sounds good but the waveform is not different from the target. The systhesised audio is delayed than the target in totally.