githubharald / CTCWordBeamSearch

Connectionist Temporal Classification (CTC) decoder with dictionary and language model.
https://towardsdatascience.com/b051d28f3d2e
MIT License
557 stars 160 forks source link

Prediction Score #41

Closed jpdevicente closed 4 years ago

jpdevicente commented 4 years ago

Is there a way to know a score of how probable is that the decodification is right according to the input CTC? I'm guessing it should be something like the accumulated probability of the path followed to get the output word. Is there a way to retrieve something like this?

weinman commented 4 years ago

My fork includes this information. See branch var_seq_len

jpdevicente commented 4 years ago

That was just what I needed, thank you so much!

jpdevicente commented 4 years ago

@weinman I've been getting segmentation fault when trying your testCustomOp.py at word_beam_search_module.word_beam_search(...) even though I have no problem when using the master branch. Do you have any idea why could this be happening?

weinman commented 4 years ago

Not sure ... this issue should probably migrate to my fork for further investigation/dialogue.

jpdevicente commented 4 years ago

Agree, though there is no issue tab in your fork.

weinman commented 4 years ago

Oops. Sorry. Fixed! (That is, I enabled issues; I still have no idea yet what might be causing the segfault.)

githubharald commented 4 years ago

have you compiled in parallel mode? if yes, there was a bug which caused a crash (see commit c1eb1262a85915c033dab4c441b94efab6da5325, the variable m_level1Cache is now protected from parallel access) which I would suggest also implementing in your branch.

weinman commented 4 years ago

Thanks for pointing that out! My fork's var_seq_len branch doesn't work with parallel mode, but I'll take the opportunity to merge your latest updates in any case.

jpdevicente commented 4 years ago

I finally made it work. I will leave what I did here for anyone else that might have the same problem in the future.

Even though the master branch worked fine for me using the pip ready version of tensorflow 1.13, some difference in weinman's branch word_beam_search_module.word_beam_search() causes a segmentation fault. I got it working by building tensorflow 1.13 from source using bazel. For reference I'm running it on Ubuntu 18.04 with Cuda 10.0. I built neither the master branch nor weinman's score branch using parallel mode, so I doubt that's the problem.

weinman commented 4 years ago

That's weird. It would have been nice to see a stack trace, but I'm glad it works. Thanks for sharing your solution path.