Closed quancs closed 1 year ago
Hello,
in the beginning of the file sms_wsj/kaldi/get_kaldi_wer.py
are several examples, how it can be used:
e.g.:
python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR decode with kaldi_data_dir=/KALDI/DATA/DIR model_egs_dir=/MODEL/EGS/DIR dataset=test_eval92
where /EXP/DIR
is the working/output dir, /KALDI/DATA/DIR
a dir with "kaldi" data style, /MODEL/EGS/DIR
the path to the trained model and test_eval92
is the dataset, i.e. a folder in /KALDI/DATA/DIR
.
Seems the dataset
parameter is not valid as it is not defined and used in get_kaldi_wer.py
And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR
, which json file should I provide?
Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py
Sorry, the signature was changed and no one checked the examples in the beginning. For decode, kaldi_data_dir and dataset are replaced by dataset_dir.
And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?
The json_path is the path to the sms_wsj.json
. In the {audio_dir}/[cv_dev93|test_eval92]
the code will search for e.g. {id}_{spk}.wav
, where id
is the example_id and spk
is 0 or 1 (Can be changed with id_to_file_name
, but requires proper escaping for the shell.).
@boeddeker Thank you for your answering. I will try. ^^
Hello, I'm back. I tried: python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
, but it reported an error (below). I don't know if I miss something.
root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 48 --mfcc-config /root/projects/sms_wsj/exp/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 48 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
utils/split_scp.pl: Refusing to split data because number of speakers 8 is less than the number of output .scp files 48
ERROR - Kaldi array - Failed after 0:00:15!
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 324, in run
decode(
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 145, in decode
ivector_dir = calculate_ivectors(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 372, in calculate_ivectors
run_process([
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 413, in run_process
subprocess.run(
File "/root/miniconda3/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['steps/online/nnet2/extract_ivectors_online.sh', '--cmd', 'run.pl', '--nj', '48', '/root/projects/sms_wsj/exp/data/sms_enh/test_eval92', '/root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor', '/root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92']' returned non-zero exit status 1.
Kaldi complains, that --nj 48
(The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj
is too large.
In https://github.com/fgnt/sms_wsj/pull/24 I pushed a fix, so the new default is min(8, os.cpu_count())
instead of os.cpu_count()
.
Alternatively, you could also change the number of jobs on the command line with num_jobs=8
.
Sorry, we didn't recognize this, because ran the code on machines with 8 cores.
Nice! Decoding now. Great thanks for your patient help. 😀
Kaldi complains, that
--nj 48
(The number of jobs/workers) is too high. Kaldi cannot split speakers and fails ifnj
is too large. In #24 I pushed a fix, so the new default ismin(8, os.cpu_count())
instead ofos.cpu_count()
.Alternatively, you could also change the number of jobs on the command line with
num_jobs=8
.Sorry, we didn't recognize this, because ran the code on machines with 8 cores.
Great.
Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? And, it also reported an error (maybe it doesn't matter).
root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp3 with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp3 directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 8 --mfcc-config /root/projects/sms_wsj/exp3/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 8 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 using the extractor in /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial 0 --extra-right-context-final 0 --frames-per-chunk 140 --nj 8 --cmd run.pl --online-ivector-dir /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --iter final /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,19) and mean=8.9
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_lattice_depth_stats.log
score best paths
local/score.sh --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
score confidence and timing with sclite
Decoding done.
%WER 7.02 [ 3168 / 45144, 494 ins, 379 del, 2295 sub ] /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/wer_9_1.0
Create data dir for sms_enh/cv_dev93 data
ERROR - Kaldi array - Failed after 0:05:40!
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 105, in _get_wav_command_for_audio_dir
assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/cv_dev93/0_4k6c0303_4k4c0319_0.wav
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 303, in run
create_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 66, in create_dir
create_data_dir_from_audio_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 170, in create_data_dir_from_audio_dir
_create_data_dir(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 246, in _create_data_dir
example_id_to_wav[example_id] = get_wav_command_fn(
File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 108, in _get_wav_command_for_audio_dir
assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/0_4k6c0303_4k4c0319_0.wav
And, I used KALDI version mentioned in the README The script has been tested with the KALDI Git hash "7637de77e0a77bf280bef9bf484e4f37c4eb9475"
.
Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? [...] And, I used KALDI version mentioned in the README [...]
I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.
And, it also reported an error (maybe it doesn't matter).
The error happened, because the code tried to decode also "cv_dev93". I will check the commands in the file and fix them. Strangely, nobody reported that they don't work.
I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.
WER = 6.85 for speech_source/test_eval92/
. This is slightly worse than the one reported in paper, i.e. 6.8.
I guess the ASR is configured to be trained on $x_d+n_d$ by default. So, I have another question: If I want to train the ASR model with direct-path signals (i.e. $s$ in paper), what parameters should I provide for train_baseline_asr.py
.
The train_baseline_asr.py
script has the option train_data_type
. The default sms_single_speaker
is the x_d+n_d
. You can change it to speech_source
or original_source
, where original_source
is the original WSJ file and speech_source
the padded WSJ file.
When you change train_data_type
, you should also change ali_data_type
to the same value. In case of original_source
you have to change, and for speech_source
it's recommended.
OK. Thank you again for your help and the creation of this dataset! ^_^
Hello. I have trained the asr model with
sms_wsj.train_baseline_asr
. I want to know is there any guidance to get the WER of the separated audios by using the trained asr model? Seems thesms_wsj/kaldi/get_kaldi_wer.py
can do that. But I don't know how to prepare my separated results (e.g. dir structrure or other data) to meet the requirements of this script.Thanks in advance.