flashlight / flashlight

A C++ standalone library for machine learning
https://fl.readthedocs.io/en/latest/
MIT License
5.26k stars 495 forks source link

Possible to return word/token level confidence and time stamp/offset via Python binding? #669

Open trias702 opened 3 years ago

trias702 commented 3 years ago

Question

Dear Sirs,

I am currently using the Python binding for Flashlight, working with the LexiconDecoder and KenLM classes to build a decoder for an ASR model I have. I currently call the decode method on the LexiconDecoder object (which is the binder class connecting Python to the C++ flashlight libs) with my ASR model emissions output (via a pointer), and everything works well.

However, I note that the flashlight.flashlight_lib_text_decoder.DecodeResult class contains only 5 attributes: amScore, lmScore, score, tokens, and words.

My question is: is there any way to possibly get the per-word or per-token confidence score from the decode method call, and, is there any way to get some information about the time step or approx time offset of each decoded word/token? The goal with the latter would be to then perform some sort of alignment between transcription and text.

I apologise if this is already documented or done somewhere else in flashlight, but I am working solely with the Python binding.

jacobkahn commented 3 years ago

@trias702 — you'd have to modify both the C++ source and bindings to get those values. If they're not bound now, there isn't a way to get them. cc @tlikhomanenko.