fgnt / sms_wsj

SMS-WSJ: Spatialized Multi-Speaker Wall Street Journal database for multi-channel source separation and recognition
MIT License
101 stars 23 forks source link

How to decode the separated audios #23

Closed quancs closed 1 year ago

quancs commented 1 year ago

Hello. I have trained the asr model with sms_wsj.train_baseline_asr. I want to know is there any guidance to get the WER of the separated audios by using the trained asr model? Seems the sms_wsj/kaldi/get_kaldi_wer.py can do that. But I don't know how to prepare my separated results (e.g. dir structrure or other data) to meet the requirements of this script.

Thanks in advance.

boeddeker commented 1 year ago


in the beginning of the file sms_wsj/kaldi/get_kaldi_wer.py are several examples, how it can be used: e.g.:

python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR decode with kaldi_data_dir=/KALDI/DATA/DIR model_egs_dir=/MODEL/EGS/DIR dataset=test_eval92

where /EXP/DIR is the working/output dir, /KALDI/DATA/DIR a dir with "kaldi" data style, /MODEL/EGS/DIR the path to the trained model and test_eval92 is the dataset, i.e. a folder in /KALDI/DATA/DIR.

quancs commented 1 year ago

Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py

quancs commented 1 year ago

And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?

boeddeker commented 1 year ago

Seems the dataset parameter is not valid as it is not defined and used in get_kaldi_wer.py

Sorry, the signature was changed and no one checked the examples in the beginning. For decode, kaldi_data_dir and dataset are replaced by dataset_dir.

And if I choose to follow the command python -m sms_wsj.kaldi.get_kaldi_wer -F /EXP/DIR with audio_dir=/AUDIO/DIR json_path=/JSON/PATH model_egs_dir=/MODEL/EGS/DIR, which json file should I provide?

The json_path is the path to the sms_wsj.json. In the {audio_dir}/[cv_dev93|test_eval92] the code will search for e.g. {id}_{spk}.wav, where id is the example_id and spk is 0 or 1 (Can be changed with id_to_file_name, but requires proper escaping for the shell.).

quancs commented 1 year ago

@boeddeker Thank you for your answering. I will try. ^^

quancs commented 1 year ago

Hello, I'm back. I tried: python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/, but it reported an error (below). I don't know if I miss something.

root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 48 --mfcc-config /root/projects/sms_wsj/exp/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 48 /root/projects/sms_wsj/exp/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
utils/split_scp.pl: Refusing to split data because number of speakers 8 is less than the number of output .scp files 48
ERROR - Kaldi array - Failed after 0:00:15!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 324, in run
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 145, in decode
    ivector_dir = calculate_ivectors(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 372, in calculate_ivectors
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 413, in run_process
  File "/root/miniconda3/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['steps/online/nnet2/extract_ivectors_online.sh', '--cmd', 'run.pl', '--nj', '48', '/root/projects/sms_wsj/exp/data/sms_enh/test_eval92', '/root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor', '/root/projects/sms_wsj/exp/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92']' returned non-zero exit status 1.
boeddeker commented 1 year ago

Kaldi complains, that --nj 48 (The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj is too large. In https://github.com/fgnt/sms_wsj/pull/24 I pushed a fix, so the new default is min(8, os.cpu_count()) instead of os.cpu_count().

Alternatively, you could also change the number of jobs on the command line with num_jobs=8.

Sorry, we didn't recognize this, because ran the code on machines with 8 cores.

quancs commented 1 year ago

Nice! Decoding now. Great thanks for your patient help. 😀

Kaldi complains, that --nj 48 (The number of jobs/workers) is too high. Kaldi cannot split speakers and fails if nj is too large. In #24 I pushed a fix, so the new default is min(8, os.cpu_count()) instead of os.cpu_count().

Alternatively, you could also change the number of jobs on the command line with num_jobs=8.

Sorry, we didn't recognize this, because ran the code on machines with 8 cores.

boeddeker commented 1 year ago


quancs commented 1 year ago

Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? And, it also reported an error (maybe it doesn't matter).

root@36b3ff17d8c5:~/projects/sms_wsj# python -m sms_wsj.kaldi.get_kaldi_wer -F exp3 with audio_dir=/data/quancs/datasets/sms_wsj/early/test_eval92/ json_path=/data/quancs/datasets/sms_wsj/sms_wsj.json model_egs_dir=$KALDI_ROOT/egs/sms_single_speaker/s5/
INFO - Kaldi array - Running command 'run'
INFO - Kaldi array - Started run with ID "1"
Create /root/projects/sms_wsj/exp3 directory
Create data dir for sms_enh/test_eval92 data
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2spk is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/text is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/wav.scp is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/spk2gender is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/utt2dur is not in sorted order or not unique, sorting it
utils/fix_data_dir.sh: file /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/reco2dur is not in sorted order or not unique, sorting it
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
steps/make_mfcc.sh --nj 8 --mfcc-config /root/projects/sms_wsj/exp3/conf/mfcc_hires.conf --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
utils/validate_data_dir.sh: Successfully validated data-directory /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for test_eval92
steps/compute_cmvn_stats.sh /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/make_mfcc /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/mfcc
Succeeded creating CMVN stats for test_eval92
fix_data_dir.sh: kept all 2664 utterances.
fix_data_dir.sh: old files are kept in /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92/.backup
Directory /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 not found, estimating ivectors
steps/online/nnet2/extract_ivectors_online.sh --cmd run.pl --nj 8 /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92
steps/online/nnet2/extract_ivectors_online.sh: extracting iVectors
steps/online/nnet2/extract_ivectors_online.sh: combining iVectors across jobs
steps/online/nnet2/extract_ivectors_online.sh: done extracting (online) iVectors to /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 using the extractor in /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/nnet3/extractor.
steps/nnet3/decode.sh --acwt 1.0 --post-decode-acwt 10.0 --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial 0 --extra-right-context-final 0 --frames-per-chunk 140 --nj 8 --cmd run.pl --online-ivector-dir /root/projects/sms_wsj/exp3/exp/sms_single_speaker/nnet3/ivectors_sms_enh_test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/nnet3/decode.sh: feature type is raw
steps/diagnostic/analyze_lats.sh --cmd run.pl --iter final /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_alignments.log
Overall, lattice depth (10,50,90-percentile)=(1,3,19) and mean=8.9
steps/diagnostic/analyze_lats.sh: see stats in /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/log/analyze_lattice_depth_stats.log
score best paths
local/score.sh --cmd run.pl /root/projects/sms_wsj/exp3/data/sms_enh/test_eval92 /root/projects/kaldi/egs/sms_single_speaker/s5/exp/sms_single_speaker/chain/tree_a_sp/graph_tgpr /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92
local/score.sh: scoring with word insertion penalty=0.0,0.5,1.0
score confidence and timing with sclite
Decoding done.
%WER 7.02 [ 3168 / 45144, 494 ins, 379 del, 2295 sub ] /root/projects/sms_wsj/exp3/exp/sms_single_speaker/tdnn1a_sp/decode_sms_enh_test_eval92/wer_9_1.0

Create data dir for sms_enh/cv_dev93 data
ERROR - Kaldi array - Failed after 0:05:40!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 105, in _get_wav_command_for_audio_dir
    assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/cv_dev93/0_4k6c0303_4k4c0319_0.wav

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 303, in run
  File "/root/projects/sms_wsj/sms_wsj/kaldi/get_kaldi_wer.py", line 66, in create_dir
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 170, in create_data_dir_from_audio_dir
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 246, in _create_data_dir
    example_id_to_wav[example_id] = get_wav_command_fn(
  File "/root/projects/sms_wsj/sms_wsj/kaldi/utils.py", line 108, in _get_wav_command_for_audio_dir
    assert audio_path.exists(), audio_path
AssertionError: /data/quancs/datasets/sms_wsj/early/test_eval92/0_4k6c0303_4k4c0319_0.wav
quancs commented 1 year ago

And, I used KALDI version mentioned in the README The script has been tested with the KALDI Git hash "7637de77e0a77bf280bef9bf484e4f37c4eb9475".

boeddeker commented 1 year ago

Decoding is over now. The WER is printed (7.02 for early/test_eval92). Is this WER correct as it doesn't match any one in your paper? [...] And, I used KALDI version mentioned in the README [...]

I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.

And, it also reported an error (maybe it doesn't matter).

The error happened, because the code tried to decode also "cv_dev93". I will check the commands in the file and fix them. Strangely, nobody reported that they don't work.

quancs commented 1 year ago

I don't know the variance of the performance, when training the ASR model. I never trained it myself. Maybe you were lucky and got a good seed. The reference number is 7.34 % WER in the paper. But it looks a bit too much.

WER = 6.85 for speech_source/test_eval92/. This is slightly worse than the one reported in paper, i.e. 6.8. I guess the ASR is configured to be trained on $x_d+n_d$ by default. So, I have another question: If I want to train the ASR model with direct-path signals (i.e. $s$ in paper), what parameters should I provide for train_baseline_asr.py.

boeddeker commented 1 year ago

The train_baseline_asr.py script has the option train_data_type. The default sms_single_speaker is the x_d+n_d. You can change it to speech_source or original_source, where original_source is the original WSJ file and speech_source the padded WSJ file. When you change train_data_type, you should also change ali_data_type to the same value. In case of original_source you have to change, and for speech_source it's recommended.

quancs commented 1 year ago

OK. Thank you again for your help and the creation of this dataset! ^_^