Open remyzerems opened 2 years ago
The main decoder loop is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/ctc_beam_search_decoder.cpp
The trie data structure used to keep individual tokens is here: https://github.com/coqui-ai/STT/blob/main/native_client/ctcdecode/path_trie.h
The log_prob_c
member contains log-probability for the current character. At decode time, only the accumulated score from the beginning of the transcript until the current node (the score
member) is copied into the Output structure: https://github.com/coqui-ai/STT/blob/11c2edb06803f38aea9d62026e54333158c876ad/native_client/ctcdecode/ctc_beam_search_decoder.cpp#L266
This Output structure is then converted into the public facing Metadata/CandidateTranscript/TokenMetadata here: https://github.com/coqui-ai/STT/blob/11c2edb06803f38aea9d62026e54333158c876ad/native_client/modelstate.cc#L40
Basically to do this one would have to write a bunch of boring code shuffling this data through the layers of the implementation, so it can be used at the API level.
@juliandarley ^
When using
*WithMetadata
functions, it would be helpful to get access to the confidence score of each token of a given candidate transcript.It could be made available in the
TokenMetadata
class as a public memberconfidence
.