flashlight / text

Text utilities, including beam search decoding, tokenizing, and more, built for use in Flashlight.
MIT License
64 stars 14 forks source link

Static kTrieMaxLabel=6 causes issues with phoneme-based recognition #75

Open JackTemaki opened 1 year ago

JackTemaki commented 1 year ago

Bug Description

I tried building ASR systems on a very common standard task (LibriSpeech-100h) using the torchaudio ctc decoder. This decoder uses the flashlight/text library as decoding backend. While my subword (BPE) based setups worked fine, the phoneme based did not.

The standard librispeech lexicon includes e.g. those 7 words, that in ARPA notation all get the same phone sequence:

BAE B AY#        
BAI B AY#           
BI B AY#                                                                                                                                                                                                                                                                                                                                                                            
BUY B AY#
BY B AY#
BY' B AY#
BYE B AY#

Which resulted e.g. in the word BY not being recognized anymore. In the log I get the message: [Trie] Trie label number reached limit: 6 which correctly tells if this limit is applied, but I would like to raise that this limit is very low, and not configurable without re-compiling. Also the message did not look to me like a serious issue at first.

Reproduction Steps

JackTemaki commented 1 year ago

After removing the limit check with the following patch, my word-error-rate went from 20.3% to 17.9%:

40,46c40,41
<   if (node->labels.size() < kTrieMaxLabel) {
<     node->labels.push_back(label);
<     node->scores.push_back(score);
<   } else {
<     std::cerr << "[Trie] Trie label number reached limit: " << kTrieMaxLabel
<               << "\n";
<   }
---
>   node->labels.push_back(label);
>   node->scores.push_back(score);

Was there any reason why this arbitrary limit was put there in the first place?

JackTemaki commented 6 months ago

Hello, is there still some interest to discuss this or get this fixed? With the proposed fix the decoder compares really well to our own decoder implementation, and I would like to use it for a scientific publication given the simplicity of using it. Currently I am providing a patch file with the setup / container image which is fine, but I would prefer if this would be fixed in the repository here directly.

If there is interest I can do the PR, but before I just want to clarify if this limit has any reasoning that I do not know about.