Confidence measures for Quartznet predictions

NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html

Apache License 2.0

11.84k stars 2.46k forks source link

Confidence measures for Quartznet predictions #1463

Closed jonaskratochvil closed 3 years ago

jonaskratochvil commented 3 years ago

Hello,

I am interested in getting the word level confidences from Quartznet model. What would be the best approach to obtain some proxy values to word level confidences? Taking the logit sequences and normalizing them by length seems like the most straight forward approach but I am not sure how reliable this would be. Any suggestions or ideas would be appreciated.

janvainer commented 3 years ago

I would also be interested in this

Omarnabk commented 3 years ago

I am interested in this as well. For now, I'm using a simple approach by averaging the confidence score of each letter per word, resulting in a confidence score per word. You can use parlance to get a timestamp per letter with the beam search.

Vishaal-MK commented 3 years ago

I am interested in this as well. For now, I'm using a simple approach by averaging the confidence score of each letter per word, resulting in a confidence score per word. You can use parlance to get a timestamp per letter with the beam search.

How do you get the confidence score of each letter?

Omarnabk commented 3 years ago

I am interested in this as well. For now, I'm using a simple approach by averaging the confidence score of each letter per word, resulting in a confidence score per word. You can use parlance to get a timestamp per letter with the beam search.

How do you get the confidence score of each letter? averaging the score config score per letter

titu1994 commented 3 years ago

We don't support this yet, but we will let you know if we add support in the future.