How to change transcription output granularity : word per word vs group of word ?

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.7k stars 1.08k forks source link

How to change transcription output granularity : word per word vs group of word ? #1289

Open Richard31S opened 1 year ago

Richard31S commented 1 year ago

Hi,

I would like to get the quickest transcription response as possible, meaning on word per word output basis. To do that, I use PartialResult BUT I get only output updates by group of 4-5-6 words. I read a post about "SetPartialWords sets to off" that should allow to do that. However, I do not see any change whatever such setting value.

I quess that there is a choice: do I want high update rate of output with potentially low confidency OR do I want output only updated when confidency is more consolidated thanks to several words? In my case, I am more inerested in first choice.

Is there a way to force a word by word output through PartialResult function? Many thanks,

nshmyrev commented 1 year ago

You probably want to send smaller chunks to recognizer. I'm not sure how big your chunks are but it could be 0.2second for example, it will send partials word by word.

Richard31S commented 1 year ago

I forgot to mention that point in my prev message. Chunk size used is 250ms and I tried to reduce to #60ms with the same result. I understand from your answer that there is no reason, except large (meaning multi-word) chunk size to explain such behaviour?

nshmyrev commented 1 year ago

Well, have you seen the test file in examples? It works ok. You need to set partial words to False

{
  "partial" : ""
}
{
  "partial" : ""
}
{
  "partial" : ""
}
{
  "partial" : "zero"
}
{
  "partial" : "zero one"
}
{
  "partial" : "zero one eight six"
}
{
  "partial" : "zero one eight zero"
}
{
  "partial" : "zero one eight zero"
}
{
  "partial" : "zero one eight zero three"
}
{