Open pengcheng-tech opened 3 years ago
Hypothesis is a NamedTuple object. You can refer attributes.
https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search.py#L19-L33
Hypothesis is a NamedTuple object. You can refer attributes.
https://github.com/espnet/espnet/blob/master/espnet/nets/beam_search.py#L19-L33
Hi, thanks for your response.
By referring to the link. I modified the code as follows:
nbests = speech2text(speech)
text, *_, score_bundle = nbests[0]
By executing the following:
print(score_bundle.score)
print(score_bundle.scores)
I got : tensor(-57.1623, device='cuda:0') {'decoder': tensor(-2.6879, device='cuda:0'), 'lm': tensor(-55.0374, device='cuda:0'), 'ctc': tensor(-0.8112, device='cuda:0')}
I think the number "-57.1623" is the the result of log P_encdec(y|x) + log P_ctc(y|x) + log P_lm(y), where log P_encdec(y|x) is -2.6879, log P_ctc(y|x) is -0.8112 and log P_lm(y) is -55.0374, a bit mismatch though...
If I denote -57.1623 as nbests[0].score Can I just grab nbests[0] until nbests[100], and using nbests[0].score/ (nbests[0].score + nbests[1].score + ...+ nbests[100].score) to roughly obtain the decoding confidence score?
Thanks a lot
score
is the weighted sum of scores. You need to decide the weight when instantiation of Speech2Text class.
You can get the arbitrary n-best scores by giving nbest
argument to Speech2Text, but I think it's not trivial to regard it as the confidence score.
Thanks for the comment.
I currently treat the "score" (i.e., -57.1623) as a rough confidence score to indicate how confident the model predicts the semantic meaning of the audio is so. From my observation, the score of nbests[0] is higher than that of the nbests[1]. I guess it is adequate for my purpose.
Hi,
Thanks for the work. I am trying to use the pre-trained model, but I don't know how to get the decoding score for the corresponding decoding results.
The code above only prints text. I would like to get decoding confidence as well.
I checked speech2text class.
From the code above I conjecture that the confidence should be obtained from the "hyp", but it is not clear to me how to parse "hyp" to get the score.