Reading the Tacotron 2 paper more closely looks like they are using a kernel size of 3. Otherwise, they would not have been able to accomplish 505 sample receptive field with 24 layers, 4 cycles, and 6 cycle size.
The math comes out to be with a kernel size of 2:
(1 + 2 + 4 + 8 + 16 + 32) * 4 + 1 == 253
The math comes out to be with a kernel size of 3:
(2 + 4 + 8 + 16 + 32 + 64) * 4 + 1 == 505
Similarly, the parallel WaveNet paper used a kernel size of 3:
This required a WaveNet with a wider receptive field, which we achieved by increasing the dilated convolution filter size from 2 to 3.
Hi There!
Can you support a kernel size of 3 please?
Reading the Tacotron 2 paper more closely looks like they are using a kernel size of 3. Otherwise, they would not have been able to accomplish 505 sample receptive field with 24 layers, 4 cycles, and 6 cycle size.
The math comes out to be with a kernel size of 2: (1 + 2 + 4 + 8 + 16 + 32) * 4 + 1 == 253
The math comes out to be with a kernel size of 3: (2 + 4 + 8 + 16 + 32 + 64) * 4 + 1 == 505
Similarly, the parallel WaveNet paper used a kernel size of 3: