Open chitralekhabhat opened 2 years ago
Looks like the error arises from not having a pronunciation for
Hi did you resolve your issue? I am also getting the same error,and I do not konw how to solve it. my code "utils/prepare_lang.sh" not use "--phone-symbol-table" ,like: utils/prepare_lang.sh --num-extra-phone-disambig-syms $extra data/subword_dict "" data/subword_lang/local data/subword_lang || exit 1;
Hi,
I am trying to use the subword units along Kaldi librispeech recipe. I have used the code snippet mentioned in the README in the stage 3 of librispeech recipe.
if [ $stage -le 3 ]; then local/prepare_dict.sh --stage 3 --nj 30 --cmd "$train_cmd" data/local/lm data/local/lm data/subword_dict utils/prepare_lang.sh --phone-symbol-table data/lang/phones.txt --num-extra-phone-disambig-syms $extra data/subword_dict "<UNK>" data/subword_lang/local data/subword_lang subdir=data/subword_lang tmpdir=data/subword_lang/local local/make_lfst_wb.py $(tail -n$extra $subdir/phones/disambig.txt) < $tmpdir/lexiconp_disambig.txt | fstcompile --isymbols=$subdir/phones.txt --osymbols=$subdir/words.txt --keep_isymbols=false --keep_osymbols=false | fstaddselfloops $dir/phones/wdisambig_phones.int $subdir/phones/wdisambig_words.int | fstarcsort --sort_type=olabel > $subdir/L_disambig.fst fi
Please let me know if I need to prepare the data/subword_dict separately or if this is correct. Currently I get the below error
FATAL: FstCompiler: Symbol "<w>" is not mapped to any integer arc olabel, symbol table = data/subword_lang/words.txt, source = standard input, line = 1 ERROR: FstHeader::Read: Bad FST header: - ERROR (fstaddselfloops[5.5.971~1-07043]:ReadFstKaldi():kaldi-fst-io.cc:35) Reading FST: error reading FST header from standard input [ Stack-Trace: ] /home/chitralekha/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7f81e210b742] fstaddselfloops(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x557e2ee630cf] /home/chitralekha/kaldi/src/lib/libkaldi-fstext.so(fst::ReadFstKaldi(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)+0x1ba) [0x7f81e25685db] fstaddselfloops(main+0x123) [0x557e2ee62afd] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7f81e1794bf7] fstaddselfloops(_start+0x2a) [0x557e2ee628fa] kaldi::KaldiFatalErrorERROR: FstHeader::Read: Bad FST header: standard input Traceback (most recent call last): File "local/make_lfst_wb.py", line 65, in <module> print_word(word, phones, False, True, 3, 0) File "local/make_lfst_wb.py", line 40, in print_word print("{}\t{}\t{}\t{}".format(cur_state,next_state,phones[0],word)) BrokenPipeError: [Errno 32] Broken pipe Exception ignored in: <_io.TextIOWrapper name='<stdout>' mode='w' encoding='ANSI_X3.4-1968'> BrokenPipeError: [Errno 32] Broken pipe
Hi,
I am trying to use the subword units along Kaldi librispeech recipe. I have used the code snippet mentioned in the README in the stage 3 of librispeech recipe.
Please let me know if I need to prepare the data/subword_dict separately or if this is correct. Currently I get the below error