Open rafiuddinkhan opened 3 years ago
We don't have separate VAD, you can only get word times.
@nshmyrev the VAD-with-noise adaptation would really enhance the accuracy as the current STT model predicts some unknown words when there is noise in the background.
This VAD is running on mobile has potential: https://github.com/SIP-Lab/CNN-VAD
Is there any way to do punctuation or post-processing for getting proper formatted sentence aster VOSK output?
The android build for VOSK api is working great for speech to text.
I want to have the start time and end time for speech buffer for chunking, if there is any way around.
Currently we get the start-time and end-time in seconds for the word predicted by vosk but if it mis-predict we will get the wrong start-time and end-time.
Thanks,