NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference
BSD 3-Clause "New" or "Revised" License
735 stars 126 forks source link

Support kernel size of 3, to replicate Tacotron 2 #21

Open PetrochukM opened 6 years ago

PetrochukM commented 6 years ago

Hi There!

Can you support a kernel size of 3 please?


Reading the Tacotron 2 paper more closely looks like they are using a kernel size of 3. Otherwise, they would not have been able to accomplish 505 sample receptive field with 24 layers, 4 cycles, and 6 cycle size.

The math comes out to be with a kernel size of 2: (1 + 2 + 4 + 8 + 16 + 32) * 4 + 1 == 253

The math comes out to be with a kernel size of 3: (2 + 4 + 8 + 16 + 32 + 64) * 4 + 1 == 505


Similarly, the parallel WaveNet paper used a kernel size of 3:

This required a WaveNet with a wider receptive field, which we achieved by increasing the dilated convolution filter size from 2 to 3.