Open fquirin opened 3 years ago
Just in case anyone wants to reproduce the exact same results with my server here is the options object for the 'welcome' event:
The audio file was: test-audio/easy_counting_en2.ogg
I asked myself the same question (using my node Vosk wrapper: https://github.com/solyarisoftware/voskJs). I believe that's a Vosk v0.3.30 change:
$ voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-small-en-us-0.15 --alternatives=3
model directory : models/vosk-model-small-en-us-0.15
speech file name : audio/2830-3980-0043.wav
grammar : not specified. Default: NO
sample rate : not specified. Default: 16000
max alternatives : 3
text only / JSON : JSON
Vosk debug level : -1
load model latency : 362ms
{
alternatives: [
{
confidence: 175.552368,
result: [
{ end: 1.02, start: 0.36, word: 'experience' },
{ end: 1.35, start: 1.02, word: 'proves' },
{ end: 1.98, start: 1.35, word: 'this' }
],
text: ' experience proves this'
}
]
}
transcript latency : 587ms
Instead, in previous Vosk release (e.g. v0.2.27), result object items included the confidence ( <=1 ) for each word: https://github.com/solyarisoftware/voskJs/tree/master/examples#simple-program-for-a-sentence-based-speech-to-text, whereas here the confidence is a unique result, for each of "alternative" result.
The change is not clear to me too. It seems that now confidence is a value for each result (sentence) instead of for each word. I still do not understand why the confidence value is > 1
I'm realizing there is a small related change ( a minor format bug maybe).
if I do NOT do call setAlternatives()
I got the old format:
{
result: [
{ conf: 1, end: 1.02, start: 0.36, word: 'experience' },
{ conf: 1, end: 1.35, start: 1.02, word: 'proves' },
{ conf: 1, end: 1.74, start: 1.35, word: 'this' }
],
text: 'experience proves this'
}
so the confidence is set to 1 for each word (it makes sense, maybe useless)
Instead, If I specify setAlternatives()
, I got the new format:
{
alternatives: [
{
confidence: 197.583099,
result: [
{ end: 1.02, start: 0.36, word: 'experience' },
{ end: 1.35, start: 1.02, word: 'proves' },
{ end: 1.98, start: 1.35, word: 'this' }
],
text: ' experience proves this'
}
]
}
Minor points. Just a remind.
result object items included the confidence ( <=1 ) for each word
Yes, you are right. There is another funny thing: If you set alternatives to 0 and words to true you get "confidence: 1" for each word, if you set alternatives to 1 (which is essentially the same as 0) the "confidence" field for each word doesn't show up at all ;-)
Alternatives confidence is not fully functional yet, we will change it in coming versions.
Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.
I see. What about the general confidence values ~500 etc. (alternatives 1, words false)? ~Can we simply scale this by some factor or does it depend on dynamic properties like length of the input ... ?~ Should we ignore this for now?
Hi @nshmyrev
Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.
Ok, considering my previous example, I guess that in this line
{ conf: 1, end: 1.02, start: 0.36, word: 'experience' },
the attribute conf
is the Minimum Bayes Risk (MBR) confidence.
But I'm still perplexed; in almost (but not all) my tests I get the value 1
, if the entire sentence is successfully recognized. But in some cases I got values different form 1. As for this audio: https://github.com/solyarisoftware/voskJs/blob/master/audio/8455-210777-0068.wav where I get conf: 0.85313
for the first word four
:
$ voskjs --audio=audio/8455-210777-0068.wav --model=models/vosk-model-small-en-us-0.15
model directory : models/vosk-model-small-en-us-0.15
speech file name : audio/8455-210777-0068.wav
grammar : not specified. Default: NO
sample rate : not specified. Default: 16000
max alternatives : undefined
text only / JSON : JSON
Vosk debug level : -1
load model latency : 313ms
transcript text : your power is sufficient i said
transcript latency : 754ms
TIME EVENT VOSK RESULT OBJECT
------ ----------- ------------------
70 partial { partial: '' }
74 partial { partial: '' }
456 partial { partial: '' }
548 partial { partial: 'your' }
620 partial { partial: 'your power is' }
718 partial { partial: 'your power is sufficient i said' }
739 partial { partial: 'your power is sufficient i said' }
754 final { result: [ { conf: 0.85313, end: 0.75, start: 0.54, word: 'your' }, { conf: 1, end: 1.08, start: 0.75, word: 'power' }, { conf: 1, end: 1.23, start: 1.08, word: 'is' }, { conf: 1, end: 1.74, start: 1.23, word: 'sufficient' }, { conf: 1, end: 1.83, start: 1.74, word: 'i' }, { conf: 1, end: 2.16, start: 1.83, word: 'said' } ], text: 'your power is sufficient i said' }
That's not fully clear to me. A documentation on what's conf attribute would very welcome. Thanks
Where can I find this function call: setAlternatives()
I was wondering how to interpret the "confidence" value? Usual values I get are between ~50 to >500 and it looks like >300 is ok.
Here is my setup:
You can reproduce my results using the BETA version of the SEPIA STT Server (there are Docker containers for all platforms):
The 'confidence' you see in my results is taken directly from 'FinalResult'.