Closed flassTer closed 5 years ago
Hello George, there is an experimental tool for that. We tested it with Jasper models. First of all, you'll need to install Baidu's beam search decoder:
scripts/install_decoders.sh
Then please follow these instructions: https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/speech-to-text-align.html
Thank you @vsl9
Hello team, does OpenSeq2Seq support audio captioning (detecting the start time and end time of a predicted word)? Thanks.