flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.38k stars 1.01k forks source link

Decode does not get best output when using decode in docker problem with word has same spelling (mean same lexicon)? #905

Closed trangtv57 closed 3 years ago

trangtv57 commented 3 years ago

Bug Description

I have done training model streaming conv net in my language. When i'm using decode in docker with just lexicon don't have oov word, everything ok, but when i'm adding lexicon for oov, something when wrong, output decode alway return oov inside not the word like normal, alough score of output word normal alway higher. And i can't figure out what happen.

Reproduction Steps

first is the beam dump decoding when not has lexicon of oov. 20200730080731-87ac3ae0-voice_transcript_v2 | 4375.661260 | 4386.166794 | -58.722191 | 8.333333 | trời ơi em a đưa cái cái điện thoại cho nó kêu là ba mươi tám ngàn lận cái cái nó móc ra tờ hai trăm hồi có tiền thối không nói vậy đó

and after adding oov lexicon, output alway replace 'hai' to 'high' or some word has same lexicon like 'hai' such as 'height' 20200730080731-87ac3ae0-voice_transcript_v2 | 4370.939512 | 4386.323524 | -65.691446 | 13.888889 | trời ơi em a đưa cái cái điện thoại cho nó kêu là ba mươi tám ngàn lận cái cái nó móc ra từ high chấm hỏi có tiền thối không nói vậy đó

focus on word has bold. And in this is lexicon of this word in file lexicon:

hai    _hai
hai    _h ai
high  _hai
high  _h ai
height _hai

spelling or lexicon of 2 word is same, but word 'hai' should be use because lm is so small when compare with 'high' and After add 'high' oov to lexicon file, beam dump just output key word oov. I don't know why?. Can you give me some idea to debug more. I have test language model of 2 sent, and sure score is right. So problem is just about decoding phase.

Additional info about wp decode:

|T|: trời ơi em a ca đưa cái cái điện thoại cho nó kêu là ba mươi tám ngàn lận cái a cái nó móc ra tờ hai trăm hỏi có tiền thối không nói vậy đó
|P|: trời ơi em a a đưa cái cái điện thoại cho nó kêu là ba mươi tám ngàn mà thế cái nó móc ra từ high trăm hồi có tiền thối không nói vậy đó
|t|: t r ờ i _ ơ i _ e m _ a _ c a _ đ ư a _ c á i _ c á i _ đ i ệ n _ t h o ạ i _ c h o _ n ó _ k ê u _ l à _ b a _ m ư ơ i _ t á m _ n g à n _ l ậ n _ c á i _ a _ c á i _ n ó _ m ó c _ r a _ t ờ _ h a i _ t r ă m _ h ỏ i _ c ó _ t i ề n _ t h ố i _ k h ô n g _ n ó i _ v ậ y _ đ ó
|p|: _ t r ờ i _ ơ i _ e m _ a _ a _ đ ư a _ c á i _ c á i _ đ i ệ n _ t h o ạ i _ c h o _ n ó _ k ê u _ l à _ b a _ m ư ơ i _ t á m _ n g à n _ m à _ t h ế _ c á i _ n ó _ m ó c _ r a _ t ừ _ h a i _ t r ă m _ h ồ i _ c ó _ t i ề n _ t h ố i _ k h ô n g _ n ó i _ v ậ y _ đ ó

Platform and Hardware

All experiment happen in docker. update: I try to use lastest version of flashlight and wav2letter, the same result.

xuqiantong commented 3 years ago

You should always expect language model to return worse score on OOV, so consider 1) retrain a new language model including most of the OOVs. 2) retune the decoder parameters (lm weight, word score, etc).

trangtv57 commented 3 years ago

tks @xuqiantong as i mention in my question, i have logged of decode output and its sure about lm score and word score is right. And first my lm is good enough for declare what phone should be is oov or not. and i don't know why you think i said lm of oov is alway wórse. i test on special case i know my lm good. my problem is about 2 word oov and word in dict have same pronunciation as same lexicon so i think trie built in wav2letter not handle this.

trangtv57 commented 3 years ago

@tlikhomanenko Can you have any help for my question. Because i see you comment in a issue have some relation, #693 .

trangtv57 commented 3 years ago

I have figured out what is my problem and how to solve, so i closed the post. thanks

tlikhomanenko commented 3 years ago

Here you can see https://github.com/facebookresearch/wav2letter/blob/v0.2/src/libraries/decoder/LexiconDecoder.cpp#L112 that we have a loop over all possible words for the same trie node (so labels here will be "hai", "height", "high" for the same path for tokens sequence "_hai"). We have only restrictions that we process max 6 https://github.com/facebookresearch/wav2letter/blob/v0.2/src/libraries/decoder/Trie.h#L16 words with the same spelling. But first max 6 words with the same spelling will be added in the beam.

tlikhomanenko commented 3 years ago

@trangtv57 could you post what was the problem?

tlikhomanenko commented 3 years ago

Seems that you probably have problem with hyps pruning, only there it could happen to remove some hyp which is better with the score you showed above.

trangtv57 commented 3 years ago

Hi @tlikhomanenko problem is i have many oov have same spelling, then my word in dictionary has prune, because your setting, anyway, i think you should add this option to defines.cpp or wiki doc. I can't understand after i try to reading and debug code. Thanks

Here you can see https://github.com/facebookresearch/wav2letter/blob/v0.2/src/libraries/decoder/LexiconDecoder.cpp#L112 that we have a loop over all possible words for the same trie node (so labels here will be "hai", "height", "high" for the same path for tokens sequence "_hai"). We have only restrictions that we process max 6 https://github.com/facebookresearch/wav2letter/blob/v0.2/src/libraries/decoder/Trie.h#L16 words with the same spelling. But first max 6 words with the same spelling will be added in the beam.