coqui-ai / STT

🐸STT - The deep learning toolkit for Speech-to-Text. Training and deploying STT models has never been so easy.
https://coqui.ai
Mozilla Public License 2.0
2.28k stars 276 forks source link

Ability to get character/word level confidence scores #2021

Open remyzerems opened 2 years ago

remyzerems commented 2 years ago

When using *WithMetadata functions, it would be helpful to get access to the confidence score of each token of a given candidate transcript.

It could be made available in the TokenMetadata class as a public member confidence.

reuben commented 2 years ago

The main decoder loop is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/ctc_beam_search_decoder.cpp

The trie data structure used to keep individual tokens is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/path_trie.h

The log_prob_c member contains log-probability for the current character. At decode time, only the accumulated score from the beginning of the transcript until the current node (the score member) is copied into the Output structure: https://github.com/coqui-ai/STT/blob/11c2edb06803f38aea9d62026e54333158c876ad/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L266

This Output structure is then converted into the public facing Metadata/CandidateTranscript/TokenMetadata here: https://github.com/coqui-ai/STT/blob/11c2edb06803f38aea9d62026e54333158c876ad/native_client/modelstate.cc#L40

Basically to do this one would have to write a bunch of boring code shuffling this data through the layers of the implementation, so it can be used at the API level.

reuben commented 2 years ago

@juliandarley ^