Hello! The Jasper / Wav2Letter+ models consist of blocks with dilated convolutions. It seems that they can't work online, cause overall time context is usually wider than piece of audio.
Although one could pad the future context until the quality remains comparable, I guess?
Am I correct or miscalculated the receptive field?
Thanks beforehand.
Hello! The Jasper / Wav2Letter+ models consist of blocks with dilated convolutions. It seems that they can't work online, cause overall time context is usually wider than piece of audio. Although one could pad the future context until the quality remains comparable, I guess?
Am I correct or miscalculated the receptive field? Thanks beforehand.