Closed gregtzar closed 4 months ago
We have models to assign cases and punctuation https://alphacephei.com/vosk/models/vosk-recasepunc-en-0.22.zip
@nshmyrev So as a practical use case from the point of view of vosk-api-- could I use one of these models as a substitute for the model that I'm already using -- in this case vosk-model-en-us-0.42-gigaspeech
? Is the end results then that the recognizer will segment the results based on this, and the text
property of the json results would then contain the punctuation and casing? I appreciate the help and I'm just trying to get my head around how to use what you've linked in conjunction with the api. (Also in my case I'm using your golang wrapper, but that's probably not even relevant).
Unfortunately, you can not use those models from go yet, only with a separate python server probably.
If you need punctuation, you can also try whisper
I'm just creating a simple pipeline that converts audio/video tracks into human readable text transcripts. At the basic level I can of course just treat each "Result" that the vosk recognizer returns based on its default timings as a sentence, and put the punctuation back in. But are there other open source libraries and/or methodologies that you would recommend for more advanced "post processing" of this nature? I imagine that vosks complete output of words and timestamps should be able to get a more intelligent treatment by some libraries. I have not had much luck finding any... maybe I'm not searching the right terms. Thanks!