Confidence value in nbest list is not normalized to 1.0?

fquirin commented 3 years ago

I was wondering how to interpret the "confidence" value? Usual values I get are between ~50 to >500 and it looks like >300 is ok.

Here is my setup:

Vosk v0.3.30
Small EN and DE models (v0.15)
Streaming audio chunks in 16Khz mono
Test files

You can reproduce my results using the BETA version of the SEPIA STT Server (there are Docker containers for all platforms):

12:30:17 - {"type":"result","msg_id":7,"code":200,"transcript":" one two three four five","isFinal":true,"confidence":529.317749,"features":{}}
...
12:30:20 - {"type":"result","msg_id":14,"code":200,"transcript":" six seven eight nine ten","isFinal":true,"confidence":557.565491,"features":{}}

The 'confidence' you see in my results is taken directly from 'FinalResult'.

fquirin commented 3 years ago

Just in case anyone wants to reproduce the exact same results with my server here is the options object for the 'welcome' event:

The audio file was: test-audio/easy_counting_en2.ogg

solyarisoftware commented 3 years ago

I asked myself the same question (using my node Vosk wrapper: https://github.com/solyarisoftware/voskJs). I believe that's a Vosk v0.3.30 change:

$ voskjs --audio=audio/2830-3980-0043.wav --model=models/vosk-model-small-en-us-0.15 --alternatives=3

model directory      : models/vosk-model-small-en-us-0.15
speech file name     : audio/2830-3980-0043.wav
grammar              : not specified. Default: NO
sample rate          : not specified. Default: 16000
max alternatives     : 3
text only / JSON     : JSON
Vosk debug level     : -1

load model latency   : 362ms

{
  alternatives: [
    {
      confidence: 175.552368,
      result: [
        { end: 1.02, start: 0.36, word: 'experience' },
        { end: 1.35, start: 1.02, word: 'proves' },
        { end: 1.98, start: 1.35, word: 'this' }
      ],
      text: ' experience proves this'
    }
  ]
}

transcript latency : 587ms

Instead, in previous Vosk release (e.g. v0.2.27), result object items included the confidence ( <=1 ) for each word: https://github.com/solyarisoftware/voskJs/tree/master/examples#simple-program-for-a-sentence-based-speech-to-text, whereas here the confidence is a unique result, for each of "alternative" result.

The change is not clear to me too. It seems that now confidence is a value for each result (sentence) instead of for each word. I still do not understand why the confidence value is > 1

solyarisoftware commented 3 years ago

I'm realizing there is a small related change ( a minor format bug maybe).

if I do NOT do call setAlternatives() I got the old format:

{
  result: [
    { conf: 1, end: 1.02, start: 0.36, word: 'experience' },
    { conf: 1, end: 1.35, start: 1.02, word: 'proves' },
    { conf: 1, end: 1.74, start: 1.35, word: 'this' }
  ],
  text: 'experience proves this'
}

so the confidence is set to 1 for each word (it makes sense, maybe useless)

Instead, If I specify setAlternatives(), I got the new format:

{
  alternatives: [
    {
      confidence: 197.583099,
      result: [
        { end: 1.02, start: 0.36, word: 'experience' },
        { end: 1.35, start: 1.02, word: 'proves' },
        { end: 1.98, start: 1.35, word: 'this' }
      ],
      text: ' experience proves this'
    }
  ]
}

Minor points. Just a remind.

fquirin commented 3 years ago

result object items included the confidence ( <=1 ) for each word

Yes, you are right. There is another funny thing: If you set alternatives to 0 and words to true you get "confidence: 1" for each word, if you set alternatives to 1 (which is essentially the same as 0) the "confidence" field for each word doesn't show up at all ;-)

nshmyrev commented 3 years ago

Alternatives confidence is not fully functional yet, we will change it in coming versions.

Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.

fquirin commented 3 years ago

I see. What about the general confidence values ~500 etc. (alternatives 1, words false)? ~Can we simply scale this by some factor or does it depend on dynamic properties like length of the input ... ?~ Should we ignore this for now?

solyarisoftware commented 3 years ago

Hi @nshmyrev

Alternatives 0 enables mbr mode which gives confidences per-word, its a different story.

Ok, considering my previous example, I guess that in this line

 { conf: 1, end: 1.02, start: 0.36, word: 'experience' },

the attribute conf is the Minimum Bayes Risk (MBR) confidence.

But I'm still perplexed; in almost (but not all) my tests I get the value 1, if the entire sentence is successfully recognized. But in some cases I got values different form 1. As for this audio: https://github.com/solyarisoftware/voskJs/blob/master/audio/8455-210777-0068.wav where I get conf: 0.85313 for the first word four:

$ voskjs --audio=audio/8455-210777-0068.wav --model=models/vosk-model-small-en-us-0.15

model directory      : models/vosk-model-small-en-us-0.15
speech file name     : audio/8455-210777-0068.wav
grammar              : not specified. Default: NO
sample rate          : not specified. Default: 16000
max alternatives     : undefined
text only / JSON     : JSON
Vosk debug level     : -1
load model latency   : 313ms
transcript text      : your power is sufficient i said
transcript latency   : 754ms

  TIME EVENT       VOSK RESULT OBJECT
------ ----------- ------------------
    70 partial     { partial: '' }
    74 partial     { partial: '' }
   456 partial     { partial: '' }
   548 partial     { partial: 'your' }
   620 partial     { partial: 'your power is' }
   718 partial     { partial: 'your power is sufficient i said' }
   739 partial     { partial: 'your power is sufficient i said' }
   754 final       { result: [ { conf: 0.85313, end: 0.75, start: 0.54, word: 'your' }, { conf: 1, end: 1.08, start: 0.75, word: 'power' }, { conf: 1, end: 1.23, start: 1.08, word: 'is' }, { conf: 1, end: 1.74, start: 1.23, word: 'sufficient' }, { conf: 1, end: 1.83, start: 1.74, word: 'i' }, { conf: 1, end: 2.16, start: 1.83, word: 'said' } ], text: 'your power is sufficient i said' }

That's not fully clear to me. A documentation on what's conf attribute would very welcome. Thanks

ester-levi commented 2 years ago

Where can I find this function call: setAlternatives()

alphacep / vosk-api

Confidence value in nbest list is not normalized to 1.0? #604