Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 904 forks source link

Issue with Dimensions/shapes of layers in Wavenet-vocoder, they seem incorrect? #486

Open jjoe1 opened 4 years ago

jjoe1 commented 4 years ago

@Rayhane-mamah I was trying to understand the wavenet vocoder implementation and some of the layer dimensions didn't seem to match based on what I understood from the wavenet paper.

I wanted to check if you could shed light on some of these dimensions as may be I'm missing something?

1) The input kernel layer shows as shaped 1x1x128. Isn't the input to the input_convolution layer the mel-spectrum frames, which are 80 float values * 10,000, so the in_channels for this conv1d layer should be 80 instead of 1?
(as 10,000 is the max decoder steps defined as max_iters in hparams.py)

inference/input_convolution/kernel:0 (float32_ref 1x1x128) [128, bytes: 512]

2) Is there reason for upsampling stride values to be [11, 25], like are the specific numbers 11 and 25 special or relevant in affecting other shapes/dimensions?

inference/ConvTranspose1D_layer_0/kernel:0 (float32_ref 1x11x80x80) [70400, bytes: 281600]
inference/ConvTranspose1D_layer_1/kernel:0 (float32_ref 1x25x80x80) [160000, bytes: 640000]

3) Why is the input-channels in residual_block_causal_conv 128 and residual_block_cin_conv 80? What exactly is their inputs? (e.g. is it mel-spectrum or just a raw floating point value?) Is the wavenet-vocoder generating just 1 float value per 1 input mel-spectrum frame of 80 floats?

inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_0/residual_block_cin_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]

The print that I see of the whole Wavenet network is shown below:

>>>model_vars = tf.trainable_variables()
>>>slim.model_analyzer.analyze_vars(model_vars, print_info=True)
---------
Variables: name (type shape) [size]
---------
inference/ConvTranspose1D_layer_0/kernel:0 (float32_ref 1x11x80x80) [70400, bytes: 281600]
inference/ConvTranspose1D_layer_0/bias:0 (float32_ref 80) [80, bytes: 320]
inference/ConvTranspose1D_layer_1/kernel:0 (float32_ref 1x25x80x80) [160000, bytes: 640000]
inference/ConvTranspose1D_layer_1/bias:0 (float32_ref 80) [80, bytes: 320]
inference/input_convolution/kernel:0 (float32_ref 1x1x128) [128, bytes: 512]
inference/input_convolution/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_0/residual_block_causal_conv_ResidualConv1DGLU_0/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_0/residual_block_cin_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_0/residual_block_cin_conv_ResidualConv1DGLU_0/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_0/residual_block_skip_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_0/residual_block_skip_conv_ResidualConv1DGLU_0/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_0/residual_block_out_conv_ResidualConv1DGLU_0/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_0/residual_block_out_conv_ResidualConv1DGLU_0/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_1/residual_block_causal_conv_ResidualConv1DGLU_1/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_1/residual_block_causal_conv_ResidualConv1DGLU_1/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_1/residual_block_cin_conv_ResidualConv1DGLU_1/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_1/residual_block_cin_conv_ResidualConv1DGLU_1/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_1/residual_block_skip_conv_ResidualConv1DGLU_1/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_1/residual_block_skip_conv_ResidualConv1DGLU_1/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_1/residual_block_out_conv_ResidualConv1DGLU_1/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_1/residual_block_out_conv_ResidualConv1DGLU_1/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_2/residual_block_causal_conv_ResidualConv1DGLU_2/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_2/residual_block_causal_conv_ResidualConv1DGLU_2/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_2/residual_block_cin_conv_ResidualConv1DGLU_2/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_2/residual_block_cin_conv_ResidualConv1DGLU_2/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_2/residual_block_skip_conv_ResidualConv1DGLU_2/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_2/residual_block_skip_conv_ResidualConv1DGLU_2/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_2/residual_block_out_conv_ResidualConv1DGLU_2/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_2/residual_block_out_conv_ResidualConv1DGLU_2/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_3/residual_block_causal_conv_ResidualConv1DGLU_3/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_3/residual_block_causal_conv_ResidualConv1DGLU_3/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_3/residual_block_cin_conv_ResidualConv1DGLU_3/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_3/residual_block_cin_conv_ResidualConv1DGLU_3/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_3/residual_block_skip_conv_ResidualConv1DGLU_3/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_3/residual_block_skip_conv_ResidualConv1DGLU_3/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_3/residual_block_out_conv_ResidualConv1DGLU_3/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_3/residual_block_out_conv_ResidualConv1DGLU_3/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_4/residual_block_causal_conv_ResidualConv1DGLU_4/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_4/residual_block_causal_conv_ResidualConv1DGLU_4/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_4/residual_block_cin_conv_ResidualConv1DGLU_4/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_4/residual_block_cin_conv_ResidualConv1DGLU_4/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_4/residual_block_skip_conv_ResidualConv1DGLU_4/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_4/residual_block_skip_conv_ResidualConv1DGLU_4/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_4/residual_block_out_conv_ResidualConv1DGLU_4/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_4/residual_block_out_conv_ResidualConv1DGLU_4/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_5/residual_block_causal_conv_ResidualConv1DGLU_5/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_5/residual_block_causal_conv_ResidualConv1DGLU_5/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_5/residual_block_cin_conv_ResidualConv1DGLU_5/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_5/residual_block_cin_conv_ResidualConv1DGLU_5/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_5/residual_block_skip_conv_ResidualConv1DGLU_5/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_5/residual_block_skip_conv_ResidualConv1DGLU_5/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_5/residual_block_out_conv_ResidualConv1DGLU_5/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_5/residual_block_out_conv_ResidualConv1DGLU_5/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_6/residual_block_causal_conv_ResidualConv1DGLU_6/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_6/residual_block_causal_conv_ResidualConv1DGLU_6/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_6/residual_block_cin_conv_ResidualConv1DGLU_6/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_6/residual_block_cin_conv_ResidualConv1DGLU_6/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_6/residual_block_skip_conv_ResidualConv1DGLU_6/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_6/residual_block_skip_conv_ResidualConv1DGLU_6/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_6/residual_block_out_conv_ResidualConv1DGLU_6/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_6/residual_block_out_conv_ResidualConv1DGLU_6/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_7/residual_block_causal_conv_ResidualConv1DGLU_7/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_7/residual_block_causal_conv_ResidualConv1DGLU_7/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_7/residual_block_cin_conv_ResidualConv1DGLU_7/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_7/residual_block_cin_conv_ResidualConv1DGLU_7/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_7/residual_block_skip_conv_ResidualConv1DGLU_7/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_7/residual_block_skip_conv_ResidualConv1DGLU_7/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_7/residual_block_out_conv_ResidualConv1DGLU_7/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_7/residual_block_out_conv_ResidualConv1DGLU_7/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_8/residual_block_causal_conv_ResidualConv1DGLU_8/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_8/residual_block_causal_conv_ResidualConv1DGLU_8/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_8/residual_block_cin_conv_ResidualConv1DGLU_8/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_8/residual_block_cin_conv_ResidualConv1DGLU_8/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_8/residual_block_skip_conv_ResidualConv1DGLU_8/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_8/residual_block_skip_conv_ResidualConv1DGLU_8/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_8/residual_block_out_conv_ResidualConv1DGLU_8/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_8/residual_block_out_conv_ResidualConv1DGLU_8/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_9/residual_block_causal_conv_ResidualConv1DGLU_9/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_9/residual_block_causal_conv_ResidualConv1DGLU_9/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_9/residual_block_cin_conv_ResidualConv1DGLU_9/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_9/residual_block_cin_conv_ResidualConv1DGLU_9/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_9/residual_block_skip_conv_ResidualConv1DGLU_9/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_9/residual_block_skip_conv_ResidualConv1DGLU_9/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_9/residual_block_out_conv_ResidualConv1DGLU_9/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_9/residual_block_out_conv_ResidualConv1DGLU_9/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_10/residual_block_causal_conv_ResidualConv1DGLU_10/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_10/residual_block_causal_conv_ResidualConv1DGLU_10/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_10/residual_block_cin_conv_ResidualConv1DGLU_10/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_10/residual_block_cin_conv_ResidualConv1DGLU_10/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_10/residual_block_skip_conv_ResidualConv1DGLU_10/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_10/residual_block_skip_conv_ResidualConv1DGLU_10/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_10/residual_block_out_conv_ResidualConv1DGLU_10/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_10/residual_block_out_conv_ResidualConv1DGLU_10/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_11/residual_block_causal_conv_ResidualConv1DGLU_11/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_11/residual_block_causal_conv_ResidualConv1DGLU_11/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_11/residual_block_cin_conv_ResidualConv1DGLU_11/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_11/residual_block_cin_conv_ResidualConv1DGLU_11/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_11/residual_block_skip_conv_ResidualConv1DGLU_11/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_11/residual_block_skip_conv_ResidualConv1DGLU_11/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_11/residual_block_out_conv_ResidualConv1DGLU_11/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_11/residual_block_out_conv_ResidualConv1DGLU_11/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_12/residual_block_causal_conv_ResidualConv1DGLU_12/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_12/residual_block_causal_conv_ResidualConv1DGLU_12/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_12/residual_block_cin_conv_ResidualConv1DGLU_12/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_12/residual_block_cin_conv_ResidualConv1DGLU_12/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_12/residual_block_skip_conv_ResidualConv1DGLU_12/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_12/residual_block_skip_conv_ResidualConv1DGLU_12/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_12/residual_block_out_conv_ResidualConv1DGLU_12/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_12/residual_block_out_conv_ResidualConv1DGLU_12/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_13/residual_block_causal_conv_ResidualConv1DGLU_13/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_13/residual_block_causal_conv_ResidualConv1DGLU_13/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_13/residual_block_cin_conv_ResidualConv1DGLU_13/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_13/residual_block_cin_conv_ResidualConv1DGLU_13/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_13/residual_block_skip_conv_ResidualConv1DGLU_13/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_13/residual_block_skip_conv_ResidualConv1DGLU_13/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_13/residual_block_out_conv_ResidualConv1DGLU_13/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_13/residual_block_out_conv_ResidualConv1DGLU_13/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_14/residual_block_causal_conv_ResidualConv1DGLU_14/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_14/residual_block_causal_conv_ResidualConv1DGLU_14/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_14/residual_block_cin_conv_ResidualConv1DGLU_14/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_14/residual_block_cin_conv_ResidualConv1DGLU_14/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_14/residual_block_skip_conv_ResidualConv1DGLU_14/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_14/residual_block_skip_conv_ResidualConv1DGLU_14/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_14/residual_block_out_conv_ResidualConv1DGLU_14/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_14/residual_block_out_conv_ResidualConv1DGLU_14/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_15/residual_block_causal_conv_ResidualConv1DGLU_15/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_15/residual_block_causal_conv_ResidualConv1DGLU_15/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_15/residual_block_cin_conv_ResidualConv1DGLU_15/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_15/residual_block_cin_conv_ResidualConv1DGLU_15/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_15/residual_block_skip_conv_ResidualConv1DGLU_15/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_15/residual_block_skip_conv_ResidualConv1DGLU_15/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_15/residual_block_out_conv_ResidualConv1DGLU_15/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_15/residual_block_out_conv_ResidualConv1DGLU_15/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_16/residual_block_causal_conv_ResidualConv1DGLU_16/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_16/residual_block_causal_conv_ResidualConv1DGLU_16/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_16/residual_block_cin_conv_ResidualConv1DGLU_16/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_16/residual_block_cin_conv_ResidualConv1DGLU_16/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_16/residual_block_skip_conv_ResidualConv1DGLU_16/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_16/residual_block_skip_conv_ResidualConv1DGLU_16/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_16/residual_block_out_conv_ResidualConv1DGLU_16/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_16/residual_block_out_conv_ResidualConv1DGLU_16/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_17/residual_block_causal_conv_ResidualConv1DGLU_17/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_17/residual_block_causal_conv_ResidualConv1DGLU_17/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_17/residual_block_cin_conv_ResidualConv1DGLU_17/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_17/residual_block_cin_conv_ResidualConv1DGLU_17/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_17/residual_block_skip_conv_ResidualConv1DGLU_17/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_17/residual_block_skip_conv_ResidualConv1DGLU_17/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_17/residual_block_out_conv_ResidualConv1DGLU_17/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_17/residual_block_out_conv_ResidualConv1DGLU_17/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_18/residual_block_causal_conv_ResidualConv1DGLU_18/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_18/residual_block_causal_conv_ResidualConv1DGLU_18/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_18/residual_block_cin_conv_ResidualConv1DGLU_18/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_18/residual_block_cin_conv_ResidualConv1DGLU_18/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_18/residual_block_skip_conv_ResidualConv1DGLU_18/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_18/residual_block_skip_conv_ResidualConv1DGLU_18/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_18/residual_block_out_conv_ResidualConv1DGLU_18/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_18/residual_block_out_conv_ResidualConv1DGLU_18/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_19/residual_block_causal_conv_ResidualConv1DGLU_19/kernel:0 (float32_ref 3x128x256) [98304, bytes: 393216]
inference/ResidualConv1DGLU_19/residual_block_causal_conv_ResidualConv1DGLU_19/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_19/residual_block_cin_conv_ResidualConv1DGLU_19/kernel:0 (float32_ref 1x80x256) [20480, bytes: 81920]
inference/ResidualConv1DGLU_19/residual_block_cin_conv_ResidualConv1DGLU_19/bias:0 (float32_ref 256) [256, bytes: 1024]
inference/ResidualConv1DGLU_19/residual_block_skip_conv_ResidualConv1DGLU_19/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_19/residual_block_skip_conv_ResidualConv1DGLU_19/bias:0 (float32_ref 128) [128, bytes: 512]
inference/ResidualConv1DGLU_19/residual_block_out_conv_ResidualConv1DGLU_19/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/ResidualConv1DGLU_19/residual_block_out_conv_ResidualConv1DGLU_19/bias:0 (float32_ref 128) [128, bytes: 512]
inference/final_convolution_1/kernel:0 (float32_ref 1x128x128) [16384, bytes: 65536]
inference/final_convolution_1/bias:0 (float32_ref 128) [128, bytes: 512]
inference/final_convolution_2/kernel:0 (float32_ref 1x128x2) [256, bytes: 1024]
inference/final_convolution_2/bias:0 (float32_ref 2) [2, bytes: 8]
Total size of variables: 3293986
Total bytes of variables: 13175944
(3293986, 13175944)