flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

decode error: both -am & --emission_dir methods failed! #213

Closed phecda-xu closed 5 years ago

phecda-xu commented 5 years ago

Hello, I have trained in THCH30 dataset, train and test stage is OK,but decode failed!

xuqiantong commented 5 years ago

Hi,

phecda-xu commented 5 years ago

Hi,

* When using `-am` flag, the program is killed in constructing trie step. Can you show me the lexicon file you are using? Is `七十二岁` a valid word in it with spelling `七 十 二 岁`?If yes, will `七十二岁` be a valid word in your LM?

* When using `-emission_dir` flag, you should not run into the branch `I0221 11:26:01.354352  6313 Decode.cpp:145] [Serialization] Running forward pass ...`, because forward pass will only be called when you want to use the acoustic model instead of emission set. So, please check your input flags again.

Thanks for your reply!

xuqiantong commented 5 years ago

Hi, for the -am reappearing issue, I think it's a bug. I will send out a fix tomorrow. Your lexicon file and LM look good to me. I believe the program is killed due to memory usage. We need to build a trie before decoding and in your case it will be huge I can imagine. You have 173604 words and 6319 tokens, so each node in the trie will have a children pointer vector of 6319 elements. According to my experience, 200K words with 5K tokens will get us a trie of size 15Gb. Looks like you are using 8 threads, then 120Gb memory will be required for only the tries. One possible solution is to use less threads. While the other one is to make some code change, if you want. Hint: Move https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L252-L308 outside the decoding function so as to make LM and trie shared among all the threads.

phecda-xu commented 5 years ago

Hi, for the -am reappearing issue, I think it's a bug. I will send out a fix tomorrow. Your lexicon file and LM look good to me. I believe the program is killed due to memory usage. We need to build a trie before decoding and in your case it will be huge I can imagine. You have 173604 words and 6319 tokens, so each node in the trie will have a children pointer vector of 6319 elements. According to my experience, 200K words with 5K tokens will get us a trie of size 15Gb. Looks like you are using 8 threads, then 120Gb memory will be required for only the tries. One possible solution is to use less threads. While the other one is to make some code change, if you want. Hint: Move https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L252-L308 outside the decoding function so as to make LM and trie shared among all the threads.

Hello,Thanks for your reply! I set thread=1 ( --beamsize=1000, --beamscore=100),and it does work! my computer only have 15.6G memory and it takes almost 15.4G to build trie and do other operations.So the decode seems slowly very much.

I0222 06:12:12.423243  6539 Utils.cpp:339] [Words] 175353 tokens loaded.
I0222 06:12:12.484014  6539 Decode.cpp:121] Number of words: 175353
I0222 06:12:12.559342  6539 NumberedFilesLoader.cpp:29] Adding dataset /data/wav2letter++/THCH/data/test ...
I0222 06:12:12.741766  6539 NumberedFilesLoader.cpp:68] 669 files found. 
I0222 06:12:19.028533  6539 Utils.cpp:102] Filtered 0/669 samples
I0222 06:12:19.028617  6539 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 669
I0222 06:12:19.028681  6539 Decode.cpp:145] [Serialization] Running forward pass ...
I0222 06:12:27.173827  6539 Decode.cpp:181] [Dataset] Number of samples per thread: 50
I0222 06:12:28.619987  6579 Decode.cpp:262] [Decoder] LM constructed.
I0222 06:27:08.930788  6579 Decode.cpp:296] [Decoder] Trie planted.
I0222 06:30:53.158293  6579 Decode.cpp:308] [Decoder] Trie smeared.
I0222 06:30:53.177062  6579 Decode.cpp:314] [Decoder] Decoder loaded in thread: 0
|T|: 而 此时 正赶上 咸阳 地 市 机构 变化 原 咸阳市 改为 秦都区 咸阳 地区 改为 咸阳市 
|P|: 而且 是 站 咸阳 地 机构 变化 原 为 其中 咸阳 地区 改为 咸阳市 
|t|: 而|此时|正赶上|咸阳|地|市|机构|变化|原|咸阳市|改为|秦都区|咸阳|地区|改为|咸阳市
|p|: 而且|是|站|咸阳|地|机构|变化|原|为|其中|咸阳|地区|改为|咸阳市

It almost need 18 mins to prepare trie and other things before print decode result. and 9 mins to decode one sample. Anyway,finally it works! Maybe, change some parameters can speed it up,I'll try. Thanks for your help!

xuqiantong commented 5 years ago

As you know, large search space (token set size) will definitely hurt the performance of any beam search engine. We only optimized our decoder on English, whose token set size is about 30. Suggestions: