Closed bill-kalog closed 3 years ago
Context helps with recognition even for "greedy" decoding. Jasper and QuartzNet implicitly learn pretty good language models. Therefore it is much easier for them to correctly spell the word "constitution" as part of the phrase "founding fathers wrote constitution".
Hi,
have you noticed reduced accuracy when inferring single word audio files with quartznet or jasper compared to longer sentences? Do you think kernel size might be affecting things?