githubharald / CTCDecoder

Connectionist Temporal Classification (CTC) decoding algorithms: best path, beam search, lexicon search, prefix search, and token passing. Implemented in Python.
https://towardsdatascience.com/3797e43a86c
MIT License
817 stars 182 forks source link

Handling duplicate paths in Beam Search #1

Closed giovannirescia closed 6 years ago

giovannirescia commented 6 years ago

Hey! I am wondering if you could help me to figure something out, in [1] you mentioned that summing up probabilities for Pr, Pr+ and Pr- leads to better results, I tried it in my implementation based on [2], and it does get better, but my probabilities are getting positive (I am working with log probabilities, so the values should be between (-inf, 0]). Did you experienced this phenomenon while implementing the sum in your algorithm?

[1] Stackexchange CTC [2] CTC implementation github

PS: Sorry if this is not the place to make this question, but I have no other way to reach you.

githubharald commented 6 years ago

Hi, you also have to remove the if block "if y[:-1] in B_:" (lines 33 and 34) from [2]. I'm working with "normal" probabilities and I'm getting values between 0 and 1 for the probability of a labelling.

Sundy1219 commented 6 years ago

when applying log probabilities, I think summing up probabilities for prTotal, prNonBlank and prBlank should be (1) curr.entries[y].prNonBlank=np.logaddexp(curr.entries[y].prNonBlank, prNonBlank) (2) curr.entries[y].prBlank=np.logaddexp(curr.entries[y].prBlank, prBlank) (3) curr.entries[y].prTotal+=np.logaddexp(prBlank,prNonBlank)

is it right? looking forward to your reply @githubharald

githubharald commented 6 years ago

in line (3) you also have to use logaddexp for curr.entries[y].prTotal instead of using +=.

Sundy1219 commented 6 years ago

Do you mean like this ? (3) curr.entries[y].prTotal=np.logaddexp(curr.entries[y].prTotal,np.logaddexp(prBlank,prNonBlank)) @githubharald

githubharald commented 6 years ago

yes. You can easily check if your changes are correct. First use the original code and print out the log of prTotal of the best labelling. Then use your code changes and print out prTotal. The values should be (more or less in case precision was lost by float calculations) the same.

# add this below the line "bestLabelling=last.sort()[0]"
print('bestLabelling prTotal:', last.entries[bestLabelling].prTotal)
Sundy1219 commented 6 years ago

Should initial values of beam entries be modified ? before before

after after

Looking forward to your reply . thank you @githubharald

githubharald commented 6 years ago

if you want to use log-probs, you have to replace each 0 with -inf, i.e. float("-inf").