Open npuichigo opened 6 years ago
@npuichigo I also have rewritten the training code by the graph. It does work and the audio sounds good but the waveform is not different from the target. The systhesised audio is delayed than the target in totally.
I used tensorboard to inspect your model structure and found that the pb model you provided just uses one softmax with 256 outputs (8 bits).
However, the paper uses two separated DNNs to predict the coarse and fine part of a sample. Is that because your model reuse the matrix of O1 and O3 (O2 and O4) or you just support 8 bits with mu-law compression?