NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Mismatch in log_probs dim and vocab dim? #1188

Closed st1992 closed 3 years ago

st1992 commented 4 years ago

Post transcription I checked log_probs size and got this result['log_probs'].size() Got torch.Size([1, 7528, 29])

Using Google Collab

print(len(quartznet.decoder.vocabulary)) Got 28

Why is there a mismatch of 28 and 29?

khursani8 commented 4 years ago

"Why is there a mismatch of 28 and 29?" I think because CTC blank token not in vocabulary

okuchaiev commented 4 years ago

yes, CTC black token is a special one and it is NOT part of the vocabulary. There are, however, a dim for it in the output logits.