Closed cdleong closed 2 years ago
You have to install fairseq from git for a version of preprocess.py
with --dict-only
Also:
prepare_text.sh:54: command not found: lmplz
prepare_text.sh:55: command not found: build_binary
You're missing kenlm
You have to install fairseq from git for a version of
preprocess.py
with--dict-only
Also:
prepare_text.sh:54: command not found: lmplz prepare_text.sh:55: command not found: build_binary
You're missing kenlm
OK, that first bit worked fine:
pip uninstall fairseq
# (navigate to the top level of the repo)
pip install --editable ./
But now I'm not sure how to install kenlm. I tried pip install https://github.com/kpu/kenlm/archive/master.zip
, but the error persists. I will try cloning and building the repo with make
, etc
Yeah; you'll need cmake for that, Also:
apt-get -y install libeigen3-dev liblzma-dev zlib1g-dev libbz2-dev
I've been running make, then setup.py, but I'm not sure if that's strictly necessary (maybe just running setup.py is enough)
Followed instructions at https://github.com/kpu/kenlm/blob/master/BUILDING to install dependencies for kenlm. What they don't mention is that you need to take the resulting binaries from kenlm/build/bin/ and copy them to /usr/bin
That seems to have fixed those errors, now trying to figure out fatal error: PHONEMIZER_ESPEAK_PATH=espeak not found is not an executable file
Warning : `load_model` does not return WordVectorModel or SupervisedModel any more, but a `FastText` object which is very similar.
2021-06-03 18:37:41 | INFO | fairseq_cli.preprocess | Namespace(no_progress_bar=False, log_interval=100, log_format=None, log_file=None, tensorboard_logdir=None, wandb_project=None, azureml_logging=False, seed=1, cpu=False, tpu=False, bf16=False, memory_efficient_bf16=False, fp16=False, memory_efficient_fp16=False, fp16_no_flatten_grads=False, fp16_init_scale=128, fp16_scale_window=None, fp16_scale_tolerance=0.0, min_loss_scale=0.0001, threshold_loss_scale=None, amp=False, amp_batch_retries=2, amp_init_scale=128, amp_scale_window=None, user_dir=None, empty_cache_freq=0, all_gather_list_size=16384, model_parallel_size=1, quantization_config_path=None, profile=False, reset_logging=False, suppress_crashes=False, use_plasma_view=False, plasma_path='/tmp/plasma', criterion='cross_entropy', tokenizer=None, bpe=None, optimizer=None, lr_scheduler='fixed', simul_type=None, scoring='bleu', task='translation', source_lang=None, target_lang=None, trainpref='/home/jovyan/work/WikiDumps/wiki_sw_head_wav2vecu_prepared/lm.upper.lid.txt', validpref=None, testpref=None, align_suffix=None, destdir='/home/jovyan/work/WikiDumps/wiki_sw_head_wav2vecu_prepared', thresholdtgt=0, thresholdsrc=2, tgtdict=None, srcdict=None, nwordstgt=-1, nwordssrc=-1, alignfile=None, dataset_impl='mmap', joined_dictionary=False, only_source=True, padding_factor=1, workers=1, dict_only=True)
fatal error: PHONEMIZER_ESPEAK_PATH=espeak not found is not an executable file
fatal error: PHONEMIZER_ESPEAK_PATH=espeak not found is not an executable file
one is
Above error seems to be coming from
sed 's/$/ 1/' $target_dir/words.txt | PHONEMIZER_ESPEAK_PATH=$(which espeak) phonemize -o $target_dir/phones.txt -p ' ' -w '' -l $ph_lg -j 70 --language-switch remove-flags
If you replace each instance of which espeak
with which espeak-ng
, that fixes that
had some errors with kaldi_initializer. Lines 51 and 52 lacked $FAIRSEQ_ROOT/
Followed instructions at https://github.com/kpu/kenlm/blob/master/BUILDING to install dependencies for kenlm. What they don't mention is that you need to take the resulting binaries from kenlm/build/bin/ and copy them to /usr/bin
Oh yeah. kenlm was intended to be embedded in other projects, so it doesn't have any sort of install mechanism
Sorry to "hijack" this issue, but it seems that the kaldi_initializer shows the following issue for the two lines after building the word lm:
Traceback (most recent call last): File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 677, in cli_main initalize_kaldi(cfg) File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 616, in initalize_kaldi cfg.out_labels = cfg.in_labels omegaconf.errors.MissingMandatoryValue: Missing mandatory value: in_labels full_key: in_labels reference_type=Optional[Dict[Union[str, Enum], Any]] object_type=dict
What argument should be passed to it? My guess was that it should be "phn" for in_labels and "wrd" for out_labels as the code seems to be building HCLG graph for latter decoding. However, I don't see where kaldi_initializer is latter used.
in_labels
is phn
for this script, out_labels
is copied from in_labels
if omitted
I've got most of the errors figured out, only one left:
Traceback (most recent call last):
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 401, in <module>
cli_main()
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 397, in cli_main
main(args)
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 287, in main
make_all(args.source_lang, src_dict)
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 255, in make_all
make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers)
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 251, in make_dataset
make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers)
File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 184, in make_binary_dataset
100 * sum(replaced.values()) / n_seq_tok[1],
ZeroDivisionError: division by zero
Which comes from this line
python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $target_dir/phones/lm.phones.filtered.txt --workers 70 --only-source --destdir $target_dir/phones --srcdict $target_dir/phones/dict.phn.txt
Try adding --thresholdsrc 2
or something similarly low; I didn't use this
for making the phone lm data, but setting the threshold was needed for
later calls to preprocess.py
Ar Déar 3 Meith 2021 ag 21:32, scríobh cdleong @.***>:
I've got most of the errors figured out, only one left:
Traceback (most recent call last): File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 401, in
cli_main() File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 397, in cli_main main(args) File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 287, in main make_all(args.source_lang, src_dict) File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 255, in make_all make_dataset(vocab, args.trainpref, "train", lang, num_workers=args.workers) File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 251, in make_dataset make_binary_dataset(vocab, input_prefix, output_prefix, lang, num_workers) File "/home/jovyan/work/fairseq//fairseq_cli/preprocess.py", line 184, in make_binary_dataset 100 * sum(replaced.values()) / n_seq_tok[1], ZeroDivisionError: division by zero Which comes from this line
python $FAIRSEQ_ROOT/fairseq_cli/preprocess.py --dataset-impl mmap --trainpref $target_dir/phones/lm.phones.filtered.txt --workers 70 --only-source --destdir $target_dir/phones --srcdict $target_dir/phones/dict.phn.txt
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/3591#issuecomment-854158699, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXQFX5G74F3YV73NY43FLTQ7RDXANCNFSM46BE3UTA .
I'll give it a try.
I note with interest that some of the files in the output dir are empty, particularly lexicon_filtered.lst
(base) jovyan@user-ofmghcmafhv-jtfbeefyexclusive-0:~/work/WikiDumps$ ll wiki_sw_head_wav2vecu_prepared/
total 72
drwxr-sr-x 3 jovyan users 4096 Jun 3 20:41 ./
drwxr-sr-x 4 jovyan users 4096 Jun 3 20:41 ../
-rw-r--r-- 1 jovyan users 4503 Jun 3 20:41 dict.txt
-rw-r--r-- 1 jovyan users 0 Jun 3 20:41 kenlm.wrd.o40003.arpa
-rw-r--r-- 1 jovyan users 0 Jun 3 20:41 lexicon_filtered.lst
-rw-r--r-- 1 jovyan users 10256 Jun 3 20:41 lexicon.lst
-rw-r--r-- 1 jovyan users 24572 Jun 3 20:41 lm.upper.lid.txt
drwxr-sr-x 2 jovyan users 4096 Jun 3 20:41 phones/
-rw-r--r-- 1 jovyan users 6773 Jun 3 20:41 phones.txt
-rw-r--r-- 1 jovyan users 1317 Jun 3 20:41 preprocess.log
-rw-r--r-- 1 jovyan users 3483 Jun 3 20:41 words.txt
(base) jovyan@user-ofmghcmafhv-jtfbeefyexclusive-0:~/work/WikiDumps$ ll wiki_sw_head_wav2vecu_prepared/phones
total 24
drwxr-sr-x 2 jovyan users 4096 Jun 3 20:42 ./
drwxr-sr-x 3 jovyan users 4096 Jun 3 20:42 ../
-rw-r--r-- 1 jovyan users 8 Jun 3 20:41 dict.phn.txt
-rw-r--r-- 1 jovyan users 8 Jun 3 20:41 dict.txt
-rw-r--r-- 1 jovyan users 0 Jun 3 20:42 lm.phones.filtered.04.arpa
-rw-r--r-- 1 jovyan users 0 Jun 3 20:41 lm.phones.filtered.txt
-rw-r--r-- 1 jovyan users 2763 Jun 3 20:41 preprocess.log
-rw-r--r-- 1 jovyan users 0 Jun 3 20:41 train.bin
-rw-r--r-- 1 jovyan users 26 Jun 3 20:41 train.idx
Perhaps some input file to that line is empty when it shouldn't be?
Something went wrong, there should be output in those files
Ar Déar 3 Meith 2021 ag 21:44, scríobh cdleong @.***>:
I'll give it a try.
I note with interest that some of the files in the output dir are empty, particularly lexicon_filtered.lst
(base) @.:~/work/WikiDumps$ ll wiki_sw_head_wav2vecu_prepared/ total 72 drwxr-sr-x 3 jovyan users 4096 Jun 3 20:41 ./ drwxr-sr-x 4 jovyan users 4096 Jun 3 20:41 ../ -rw-r--r-- 1 jovyan users 4503 Jun 3 20:41 dict.txt -rw-r--r-- 1 jovyan users 0 Jun 3 20:41 kenlm.wrd.o40003.arpa -rw-r--r-- 1 jovyan users 0 Jun 3 20:41 lexicon_filtered.lst -rw-r--r-- 1 jovyan users 10256 Jun 3 20:41 lexicon.lst -rw-r--r-- 1 jovyan users 24572 Jun 3 20:41 lm.upper.lid.txt drwxr-sr-x 2 jovyan users 4096 Jun 3 20:41 phones/ -rw-r--r-- 1 jovyan users 6773 Jun 3 20:41 phones.txt -rw-r--r-- 1 jovyan users 1317 Jun 3 20:41 preprocess.log -rw-r--r-- 1 jovyan users 3483 Jun 3 20:41 words.txt (base) @.:~/work/WikiDumps$ ll wiki_sw_head_wav2vecu_prepared/phones total 24 drwxr-sr-x 2 jovyan users 4096 Jun 3 20:42 ./ drwxr-sr-x 3 jovyan users 4096 Jun 3 20:42 ../ -rw-r--r-- 1 jovyan users 8 Jun 3 20:41 dict.phn.txt -rw-r--r-- 1 jovyan users 8 Jun 3 20:41 dict.txt -rw-r--r-- 1 jovyan users 0 Jun 3 20:42 lm.phones.filtered.04.arpa -rw-r--r-- 1 jovyan users 0 Jun 3 20:41 lm.phones.filtered.txt -rw-r--r-- 1 jovyan users 2763 Jun 3 20:41 preprocess.log -rw-r--r-- 1 jovyan users 0 Jun 3 20:41 train.bin -rw-r--r-- 1 jovyan users 26 Jun 3 20:41 train.idx
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/pytorch/fairseq/issues/3591#issuecomment-854166098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABXQFWS5IM3NJIDV3REUQLTQ7ST5ANCNFSM46BE3UTA .
Specifically the divide-by-zero is happening on this line, which means something wrong before then, I think: https://github.com/pytorch/fairseq/blob/c47a9b2eef0f41b0564c8daf52cb82ea97fc6548/examples/wav2vec/unsupervised/scripts/prepare_text.sh#L44
Specifically the divide-by-zero is happening on this line, which means something wrong before then, I think:
Nah, I was getting the same thing and the input was fine. The same divide-by-zero later was happening because the threshold was set too high, I'm going to assume the same is happening here.
Added the threshold, still getting divide-by-zero.
I'm still curious, not sure why lexicon_filtered.lst
is empty.
python filter_lexicon.py -d $target_dir/phones/dict.txt < $target_dir/lexicon.lst >! $target_dir/lexicon_filtered.lst
is the line that creates lexicon_filtered.lst
,
$target_dir/lexicon.lst
doesn't seem to be empty. dict.txt
in $target_dir/phones/
has only <SIL> 0
in itdict.xt
in $target_dir/
has plenty of stuff in it. Hmmmmmm...
Specifically I added the threshold to line 44, btw
I note that https://github.com/pytorch/fairseq/blob/c47a9b2eef0f41b0564c8daf52cb82ea97fc6548/examples/wav2vec/unsupervised/scripts/prepare_text.sh#L38 already has a threshold param, perhaps that's where I was supposed to change it.
OK, even with threshold 2 on every instance of preprocess.py, still getting divide by zero
Also, I missed this, but I'm getting this error as well:
Primary config directory not found.
Check that the config directory '/home/jovyan/work/fairseq/examples/speech_recognition/kaldi/config' exists and readable
Also, I missed this, but I'm getting this error as well:
Primary config directory not found. Check that the config directory '/home/jovyan/work/fairseq/examples/speech_recognition/kaldi/config' exists and readable
I created an empty config directory there to bypass the test and ran with
lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words_sil lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones "blank_symbol='<SIL>'" lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones
Also, I missed this, but I'm getting this error as well:
Primary config directory not found. Check that the config directory '/home/jovyan/work/fairseq/examples/speech_recognition/kaldi/config' exists and readable
I created an empty config directory there to bypass the test and ran with
lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words_sil lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones "blank_symbol='<SIL>'" lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones
I tried actually creating the yaml for it, and it ignores it completely 🤷
i am working on more comprehensive instructions on how to run the pipeline - should have something by next week - stay tuned. meanwhile i can answer questions here if need be
Also, I missed this, but I'm getting this error as well:
Primary config directory not found. Check that the config directory '/home/jovyan/work/fairseq/examples/speech_recognition/kaldi/config' exists and readable
I created an empty config directory there to bypass the test and ran with
lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words_sil lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones "blank_symbol='<SIL>'" lg=$lg python $FAIRSEQ_ROOT/examples/speech_recognition/kaldi/kaldi_initializer.py kaldi_root=/path/to/kaldi in_labels=phn fst_dir=$target_dir/fst/phn_to_words lm_arpa=$target_dir/kenlm.wrd.o40003.arpa wav2letter_lexicon=$target_dir/lexicon_filtered.lst data_dir=$target_dir/phones
I tried actually creating the yaml for it, and it ignores it completely shrug
I think I celebrated too early; when running kaldi_initializer it seems the two lines just quit in the middle before finishing building the final decoding graph. The text corpus was the LibriSpeech LM corpus (https://www.openslr.org/11; using librispeech-lm-norm.txt.gz as is before feeding to prepare_text.sh
)
For the first line:
[2021-06-03 15:55:15,071][main][INFO] - Creating /nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words_sil/LG.phn.kenlm.wrd.o40003.fst [2021-06-03 18:19:26,628][main][ERROR] - cmd: [PosixPath('/nobackup/users/junruin2/kaldi/src/fstbin/fstpushspecial')], err: /nobackup/users/junruin2/kaldi/src/fstbin/fstpushspecial Traceback (most recent call last): File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 677, in cli_main initalize_kaldi(cfg) File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 657, in initalize_kaldi kaldi_root, fst_dir, unique_label, lexicon_graph, grammar_graph File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 273, in create_LG check=True, File "/nobackup/users/junruin2/anaconda3/envs/espnet-pt1.7.1/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '[PosixPath('/nobackup/users/junruin2/kaldi/src/fstbin/fstpushspecial')]' died with <Signals.SIGKILL: 9>.
For the second line:
[2021-06-03 21:57:41,011][main][INFO] - Creating /nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/HLGa.phn.kenlm.wrd.o40003.fst [2021-06-03 22:29:25,118][main][ERROR] - cmd: [PosixPath('/nobackup/users/junruin2/kaldi/src/fstbin/fsttablecompose'), PosixPath('/nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/H.phn.fst'), PosixPath('/nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/LG.phn.kenlm.wrd.o40003.fst')], err: /nobackup/users/junruin2/kaldi/src/fstbin/fsttablecompose /nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/H.phn.fst /nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/LG.phn.kenlm.wrd.o40003.fst Traceback (most recent call last): File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 677, in cli_main initalize_kaldi(cfg) File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 660, in initalize_kaldi kaldi_root, fst_dir, unique_label, h_graph, lg_graph, disambig_in_units_file_int File "/nobackup/users/junruin2/fairseq//examples/speech_recognition/kaldi/kaldi_initializer.py", line 458, in create_HLGa check=True, File "/nobackup/users/junruin2/anaconda3/envs/espnet-pt1.7.1/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '[PosixPath('/nobackup/users/junruin2/kaldi/src/fstbin/fsttablecompose'), PosixPath('/nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/H.phn.fst'), PosixPath('/nobackup/users/junruin2/fairseq/examples/wav2vec/unsupervised/librispeech_files/unpaired_text/fst/phn_to_words/LG.phn.kenlm.wrd.o40003.fst')]' died with <Signals.SIGKILL: 9>.
Any idea why it would be the case? Also, in which stage are the fsts built by kaldi_initializer.py used?
Thanks!
most likely you ran out of cpu memory. try to prune the lm a bit more before building fst with it
Here's what I have for dependencies so far, @alexeib does it look right to you?
pip install phonemizer fasttext
git clone https://github.com/pytorch/fairseq
cd fairseq
pip install --editable ./
# phonemizer dependencies
sudo apt-get install festival espeak-ng mbrola
#kenlm dependencies from official website
sudo apt-get install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev
Gotta build kenlm as shown here and copy the built binaries to /usr/bin. You need to apt-install the dependencies first.
Sorry for jumping into this issue again, but I assumed the people involved in the discussion here would be interested in at least replicating some results, so here it goes. Has anyone here successfully trained the GAN model on any of the corpora used by the original paper (https://ai.facebook.com/research/publications/unsupervised-speech-recognition/) and achieved somewhere close to the error rate there? If so, what modifications/hyper-parameters have you modified in the currently published code to achieve so? My attempts have been less than successful (https://github.com/pytorch/fairseq/issues/3581) for the past a week and a half, and I just cannot figure out why...
(Edit 06/08/2021: It works on LibriSpeech-100h! It was my mistake for forgetting to check for .tsv and .phn alignment after removing the silences and re-running wav2vec_manifest.py. The issue is still there for TIMIT however.)
@alexeib
Hi, when I tried to use kaldi decoder for wav2vec_generate.py, I found that for some reason, the final HLG graphs were not successfully built when running kaldi_initializers.py
in prepare_text.sh
. I kept getting the following error, which occurred when executing the very last create_HLG()
function within kaldi_initializers.py
.
In file included from /home/software/spack/gcc/8.3.0-xdjkb2mmftikxvoeeyaxtxcjtpltcgiz/include/c++/8.3.0/random:51, from /nobackup/users/junruin2/pykaldi-py3.7.9/tools/kaldi/tools/openfst-1.6.7/include/fst/randgen.h:14, from /nobackup/users/junruin2/pykaldi-py3.7.9/tools/kaldi/tools/openfst-1.6.7/include/fst/randequivalent.h:15, from /nobackup/users/junruin2/pykaldi-py3.7.9/tools/kaldi/tools/openfst-1.6.7/include/fst/fstlib.h:61, from /nobackup/users/junruin2/pykaldi-py3.7.9/tools/kaldi/src/fstext/fstext-lib.h:22, from /nobackup/users/junruin2/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple.cc:9: /home/software/spack/gcc/8.3.0-xdjkb2mmftikxvoeeyaxtxcjtpltcgiz/include/c++/8.3.0/bits/random.tcc:2737:5: note: candidate: 'template<class _IntType, class _CharT, class _Traits> std::basic_ostream<_CharT, _Traits>& std::operator<<(std::basic_ostream<_CharT, _Traits>&, const std::discrete_distribution<_IntType>&)' operator<<(std::basic_ostream<_CharT, _Traits>& __os, ^
~~~ /home/software/spack/gcc/8.3.0-xdjkb2mmftikxvoeeyaxtxcjtpltcgiz/include/c++/8.3.0/bits/random.tcc:2737:5: note: template argument deduction/substitution failed: /nobackup/users/junruin2/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple.cc:91:52: note: 'kaldi::MessageLogger' is not derived from 'std::basic_ostream<_CharT, _Traits>' KALDI_LOG << "Writing FST to " << output << std::endl;
The kaldi version I have is from the latest pykaldi compatible fork: https://github.com/pykaldi/kaldi/tree/pykaldi_02
However, the code seems to be running for now as long as I remove all lines that has an ostream regarding KALDI_LOG
. Really don't think it will affect anything other than logging though.
i am working on more comprehensive instructions on how to run the pipeline - should have something by next week - stay tuned. meanwhile i can answer questions here if need be
is it finished?
yes, instructions should be good now.
regarding building the binary for adding self loops - i have plans to rewrite this using pykaldi api instead of c++ but it may take some time. meanwhile, you probably need to build the kaldi toolkit in your pykaldi dir (they have a script for that I believe)
we will have working timit instructions up next week
for librispeech, you can get it to work with as little as 10h of audio, but depending on what you use for text you may need to adjust the 1k threshold when building phone dict
Hello, I have a problem prepare_text.sh. lexicon.lst file is not created by prepare_text.sh but it wants to use it. What am I supposed to do about this? Another question of mine is language. Can I finetune for my language? @alexeib
it should be created by this line in the script:
paste $target_dir/words.txt $target_dir/phones.txt >! $target_dir/lexicon.lst
maybe some intermediate step failed?
sorry i dont understand the language question - what exactly do you want to finetune ?
I saw that line should work but it did not. Thus I created a manual lexicon.lst. Then filtered_lexicon.lst is absent and again failed. preparing text is challenging to me.I don't know the exact reason for this situation. Some intermediate steps may be failed but I did not get any error about it. Seeing a working prepared step outputs would be great. @alexeib I managed to create a lexicon filtered and now I got zero division error.
Hello, I can't run script prepare_text.sh, I get a error [2021-06-22 19:44:15,455][main][INFO] - Creating /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/H.phn.fst [2021-06-22 19:44:15,556][main][INFO] - Creating /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/L.phn.lm.phones.filtered.06.fst (in units: /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/kaldi_dict.phn_disambig.txt) [2021-06-22 19:44:15,686][main][INFO] - Creating /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/LG.phn.lm.phones.filtered.06.fst [2021-06-22 19:44:16,007][main][INFO] - Creating /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLGa.phn.lm.phones.filtered.06.fst [2021-06-22 19:44:16,500][main][INFO] - Creating /hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLG.phn.lm.phones.filtered.06.fst [2021-06-22 19:44:17,414][main][ERROR] - cmd: [PosixPath('/hdd/conda_kaldi/rnd_ds/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple'), PosixPath('/hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLGa.phn.lm.phones.filtered.06.fst'), PosixPath('/hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLG.phn.lm.phones.filtered.06.fst')], err: b'' Traceback (most recent call last): File "/hdd/conda_kaldi/rnd_ds/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py", line 677, in cli_main initalize_kaldi(cfg) File "/hdd/conda_kaldi/rnd_ds/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py", line 662, in initalize_kaldi hlg_graph = create_HLG(kaldi_root, fst_dir, unique_label, hlga_graph) File "/hdd/conda_kaldi/rnd_ds/fairseq/examples/speech_recognition/kaldi/kaldi_initializer.py", line 595, in create_HLG subprocess.run( File "/hdd/conda_kaldi/rnd_ds/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '[PosixPath('/hdd/conda_kaldi/rnd_ds/fairseq/examples/speech_recognition/kaldi/add-self-loop-simple'), PosixPath('/hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLGa.phn.lm.phones.filtered.06.fst'), PosixPath('/hdd/conda_kaldi/exp_unsup_asr/prepare_text/fst/phn_to_phn_sil/HLG.phn.lm.phones.filtered.06.fst')]' died with <Signals.SIGSEGV: 11>. Do you have any idea why? @alexeib
you probably ran out of memory because your corpus/num of phonemes is too big. you can change the script to prune the lm or create a 4 or 5gram phone lm instead of 6gram, for use in the WFST
I saw that line should work but it did not. Thus I created a manual lexicon.lst. Then filtered_lexicon.lst is absent and again failed. preparing text is challenging to me.I don't know the exact reason for this situation. Some intermediate steps may be failed but I did not get any error about it. Seeing a working prepared step outputs would be great. @alexeib I managed to create a lexicon filtered and now I got zero division error.
maybe your corpus is too small and you need to use a smaller phone cutoff threshold? (the example in the readme uses 1000 which is suitable for medium to large corpuses, but not small)
I saw that line should work but it did not. Thus I created a manual lexicon.lst. Then filtered_lexicon.lst is absent and again failed. preparing text is challenging to me.I don't know the exact reason for this situation. Some intermediate steps may be failed but I did not get any error about it. Seeing a working prepared step outputs would be great. @alexeib I managed to create a lexicon filtered and now I got zero division error.
maybe your corpus is too small and you need to use a smaller phone cutoff threshold? (the example in the readme uses 1000 which is suitable for medium to large corpuses, but not small)
I tried with even 2 as threshold but again it gives an error. What should be the size of corpus ? @alexeib
@alexeib thank you for updating the instructions! They look much improved!
I've finally gotten around to looking at them again, and I have a few things I ran into that I wanted to suggest changes about.
In https://github.com/pytorch/fairseq/tree/master/examples/wav2vec/unsupervised#preparation-of-speech-and-text-data, there is this code block:
# create a manifest file for the set original of audio files
python $FAIRSEQ_ROOT/examples/wav2vec/wav2vec_manifest.py /dir/to/save/audio/files --ext wav --dest /path/to/new/train.tsv --valid-percent 0
python scripts/vads.py -r $RVAD_ROOT < /path/to/train.tsv > train.vads
python scripts/remove_silence.py --tsv /path/to/train.tsv --vads train.vads --out /dir/to/save/audio/files
python $FAIRSEQ_ROOT/examples/wav2vec/wav2vec_manifest.py /dir/to/save/audio/files --ext wav --dest /path/to/new/train.tsv --valid-percent 0.01
I think that the middle two lines could be updated to include $FAIRSEQ_ROOT as well, like so:
python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/vads.py -r $RVAD_ROOT < /path/to/train.tsv > train.vads
python $FAIRSEQ_ROOT/examples/wav2vec/unsupervised/scripts/remove_silence.py --tsv /path/to/train.tsv --vads train.vads --out /dir/to/save/audio/files
The --dest
arg on this command may be misleading. It actually doesn't want the path to a new file, it wants the path to a directory in which it will create the new train.tsv.
python $FAIRSEQ_ROOT/examples/wav2vec/wav2vec_manifest.py /dir/to/save/audio/files --ext wav --dest /path/to/new/train.tsv --valid-percent 0
When I gave it a the path to the directory ./foo/
, it created ./foo/train.tsv
in that directory.
On the other hand, when I gave it the path ./bar/train.tsv
, it created the directory ./bar/train.tsv/
, and created train.tsv
inside that, so I ended up with ./bar/train.tsv/train.tsv
It's often helpful to avoid issues with spaces in the paths, to just wrap all bash variables in quotes.
"$FAIRSEQ_ROOT"
I think these might help the instructions be even easier for people to follow along with. Any thoughts?
those are all great suggestions. if you would like to submit a PR to improve docs, you are most welcome to do so! otherwise i will keep this in mind when i next touch w2v-u code
Roger! I'm still going through and taking notes, but I think I might be able to contribute some stuff.
Here's a new note, I just discovered that faiss
and npy-append-array
are also dependencies for the preprocessing. They're used in prepare_audio.sh
Another thing I just ran across: IndexError: list index out of range
in wav2vec_cluster_faiss.py
I think it's caused by the fact that I was using https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small.pt, which may simply just have 12 layers. I edited the script to print out the length of the relevant variables...
print(f"res length: " + str(len(res["layer_results"])))
print(f"self.layer: {self.layer}")
and I get an output like so.
res length: 12
self.layer: 14
0%| | 0/1497 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py", line 219, in <module>
main()
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py", line 153, in main
for f in tqdm.tqdm(iterator, total=num):
File "/opt/conda/envs/wav2vecu/lib/python3.7/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py", line 132, in iterate
feats = reader.get_feats(fname)
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_cluster_faiss.py", line 110, in get_feats
res_layer = res["layer_results"][self.layer]
IndexError: list index out of range
Gonna try again with the larger model, https://dl.fbaipublicfiles.com/fairseq/wav2vec/libri960_big.pt
Alas, when I try that, it fails in the extract_features step:
Traceback (most recent call last):
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py", line 119, in <module>
main()
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py", line 107, in main
generator, num = get_iterator(args)
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py", line 76, in get_iterator
reader = Wav2VecFeatureReader(args.checkpoint, args.layer)
File "/home/jovyan/fairseq/examples/wav2vec/unsupervised/scripts/wav2vec_extract_features.py", line 39, in __init__
[cp_file]
File "/home/jovyan/fairseq/fairseq/checkpoint_utils.py", line 446, in load_model_ensemble_and_task
model = task.build_model(cfg.model)
File "/home/jovyan/fairseq/fairseq/tasks/audio_pretraining.py", line 294, in build_model
model = super().build_model(model_cfg)
File "/home/jovyan/fairseq/fairseq/tasks/fairseq_task.py", line 324, in build_model
model = models.build_model(cfg, self)
File "/home/jovyan/fairseq/fairseq/models/__init__.py", line 96, in build_model
return model.build_model(cfg, task)
File "/home/jovyan/fairseq/fairseq/models/wav2vec/wav2vec2_asr.py", line 176, in build_model
w2v_encoder = Wav2VecEncoder(cfg, len(task.target_dictionary))
File "/home/jovyan/fairseq/fairseq/tasks/audio_pretraining.py", line 267, in target_dictionary
return self.state.target_dictionary
File "/home/jovyan/fairseq/fairseq/tasks/fairseq_task.py", line 41, in __getattr__
self._state[name] = self._factories[name]()
File "/home/jovyan/fairseq/fairseq/tasks/audio_pretraining.py", line 178, in load_target_dictionary
return Dictionary.load(dict_path)
File "/home/jovyan/fairseq/fairseq/data/dictionary.py", line 225, in load
d.add_from_file(f)
File "/home/jovyan/fairseq/fairseq/data/dictionary.py", line 238, in add_from_file
raise fnfe
File "/home/jovyan/fairseq/fairseq/data/dictionary.py", line 235, in add_from_file
with open(PathManager.get_local_path(f), "r", encoding="utf-8") as fd:
FileNotFoundError: [Errno 2] No such file or directory: '/checkpoint/abaevski/data/speech/libri/960h/wav2vec/raw/dict.ltr.txt
Tried again with xlsr_53_56k.pt, don't get FileNotFoundError! and the length of res["layer_results"]
for that model is 15. So it definitely seems that using wav2vec small was why I had the IndexError
Tried tracing the FileNotFoundError back, and it seems that when loading in https://dl.fbaipublicfiles.com/fairseq/wav2vec/libri960_big.pt, it actually contains the following key/value pair within it:
'data': '/checkpoint/abaevski/data/speech/libri/960h/wav2vec/raw/'
whereas https://dl.fbaipublicfiles.com/fairseq/wav2vec/xlsr_53_56k.pt does not.
I went to load_checkpoint_to_cpu()
in checkpoint_utils, and added some print statements to see what's in there, right before the return statement.
When I load the XLSR 53 pretrained model and look at state["cfg"]["task"]
I see
{'_name': 'audio_pretraining', 'data': '/private/home/aconneau/projects/XLSR/MLS/53bis/', 'labels': None, 'sample_rate': 16000, 'normalize': True, 'enable_padding': False, 'max_sample_size': 320000, 'min_sample_size': 32000, 'eval_wer': False, 'eval_wer_config': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': 'hard', 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_wer_tokenizer': None, 'eval_wer_post_process': 'letter', 'autoregressive': False}
whereas for 960h, we get
{'_name': 'audio_pretraining', 'data': '/checkpoint/abaevski/data/speech/libri/960h/wav2vec/raw/', 'labels': 'ltr', 'binarized_dataset': False, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_sample_size': None, 'min_sample_size': None, 'eval_wer': False, 'eval_wer_config': {'_name': None, 'beam': 5, 'nbest': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': 0.0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, 'constraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': None, 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker': False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_wer_tokenizer': None, 'eval_wer_post_process': 'letter', 'autoregressive': False, 'num_batch_buckets': 0, 'precompute_mask_indices': False, 'inferred_w2v_config': None, 'tpu': True}
which look very similar! So why does one succeed, and the other fail?
Ah, one has "labels", the other does not.
My Question:
How can I get prepare_text.sh running correctly in a fresh Ubuntu Jupyterlab environment? What needs to be installed, what variables set, etc.?
I've run into various issues attempting to run the script prepare_text.sh, from https://github.com/pytorch/fairseq/blob/master/examples/wav2vec/unsupervised/scripts/prepare_text.sh.
Right now, I'm stuck on
preprocess.py: error: unrecognized arguments: --dict-only
, but I've run into some other errors that I've had to workaround, detailed below.Full current output:
After getting through all the other issues I detail below, currently this is what I see when I attempt to run the script.
I cloned the https://github.com/pytorch/fairseq.git repo, and navigated to the scripts folder: https://github.com/pytorch/fairseq/tree/master/examples/wav2vec/unsupervised/scripts before running this.
Fixed (?) Problem: Can't seem to run it from the same folder as the README (workaround: run from scripts folder)
First, I can't run it from the same folder as the README at https://github.com/pytorch/fairseq/tree/master/examples/wav2vec/unsupervised#preparation-of-speech-and-text-data says to. If you try doing so, you get errors with, e.g. path not found to other scripts.
Fixed (?) Problem: "ValueError: lid.187.bin cannot be opened for loading!" (workaround: use lid.176.bin instead)
Solution: download a different language ID model, and edit the code to use it.
https://fasttext.cc/docs/en/language-identification.html has a different model, lid.176.bin
and edit this portion of normalize_and_filter_text.py:
Fixed (?) Problem: dependencies needed (phonemizer, fasttext, fairseq)
The script does not list which dependencies are needed. So far I've determined that phonemizer, fasttext are needed, and I think fairseq too. Any more I'm missing?
Fixed (?) Problem: can't find files in fairseq_cli: (solution: iYou need to set an environment variable, FAIRSEQ_ROOT).
I set this to point to the top level of the cloned repo. not sure if that's right.
(I cloned the repo to
~/work/fairseq/
)Fixed (?) Problem: Not sure what language code to use. (guessed
sw
)I've got Swahili data. Not sure whether to use
sw
, orswahili
or what, I assume I should pick from https://github.com/espeak-ng/espeak-ng/blob/master/docs/languages.mdCode
Here's the command I use to invoke the script. Other than editing the default langid model, I haven't edited anything else in the repo, should be the same as https://github.com/pytorch/fairseq/tree/master/examples/wav2vec/unsupervised/scripts. git log shows c47a9b2eef0f41b0564c8daf52cb82ea97fc6548 as the commit.
What have you tried?
What's your environment?
I'm in a Jupyterlab in a Docker container, running Ubuntu.
OS is Ubuntu 20.04.2:
pip list:
conda list:
I also apt-installed phonemizer dependencies:
And finally, here's what I get from
apt list|grep installed
apt-list.txt