Output of the first DeepSpeech convolution does not fit the second layer:
out_W = (W + 2 pad_w - filter_w + 1) / stride_w = (700 + 0 - 5 + 1) / 2 = 348
out_H = (H + 2 pad_h - filter_h + 1) / stride_h = (161 + 0 - 20 + 1) / 2 = 71
I guess R should be filter height and S should be filter width. In that case DeepSpeech layers fit perfectly:
out_W = (700 + 0 - 20 + 1) / 2 = 341
out_H = (161 + 0 - 5 + 1) / 2 = 79
Hi, Looks like there is a typo in spreadsheets:
Output of the first DeepSpeech convolution does not fit the second layer: out_W = (W + 2 pad_w - filter_w + 1) / stride_w = (700 + 0 - 5 + 1) / 2 = 348 out_H = (H + 2 pad_h - filter_h + 1) / stride_h = (161 + 0 - 20 + 1) / 2 = 71
I guess R should be filter height and S should be filter width. In that case DeepSpeech layers fit perfectly: out_W = (700 + 0 - 20 + 1) / 2 = 341 out_H = (161 + 0 - 5 + 1) / 2 = 79
Please also chech KWS case for the same issue.