fgnt / pb_chime5

Speech enhancement system for the CHiME-5 dinner party scenario
MIT License
109 stars 34 forks source link

Add changes to apply this system to libriCSS #19

Open boeddeker opened 4 years ago

boeddeker commented 4 years ago

I am not yet sure, what the best way is to provide the files. For multichannel files like they are used in libriCSS, scp would be ok, but it does not generalize to multiple files and each contain a single channel.

@keisukekino (CC: @tnakieee) Could you test the code, if it works for you?

I ran the code with mpirun -np 16 python -m pb_ime5.scripts.kaldi_run_rttm_libri_css with storage_dir=exp database_rttm=overlap_ratio_40.0_sil0.1_1.0_session0_actual39.5.rttm job_id=1 number_of_jobs=1 where I replaced the mpirun with our HPC command. With 103 cores it took 4:40 minutes.

aarora8 commented 3 years ago

I am not yet sure, what the best way is to provide the files. For multichannel files like they are used in libriCSS, scp would be ok, but it does not generalize to multiple files and each contain a single channel.

@keisukekino (CC: @tnakieee) Could you test the code, if it works for you?

I ran the code with mpirun -np 16 python -m pb_ime5.scripts.kaldi_run_rttm_libri_css with storage_dir=exp database_rttm=overlap_ratio_40.0_sil0.1_1.0_session0_actual39.5.rttm job_id=1 number_of_jobs=1 where I replaced the mpirun with our HPC command. With 103 cores it took 4:40 minutes.

Hi, thank you for this pull request to run guided source separation with dataset other than CHiME. I am trying to run it with a different dataset. It contains 6 wav files for each session. I created the rttm file and session_to_audio_paths json file.

My session_to_audio_paths JSON file is as follows: { "session_id": '[{audio_path}/session_id_ch1.wav, {audio_path}/session_id_ch2.wav, {audio_path}/session_id_ch3.wav, {audio_path}/session_id_ch4.wav, {audio_path}/session_id_ch5.wav, {audio_path}/session_id_ch6.wav]' }

my rttm is as follow:

SPEAKER session_id 1 0.45 0.55 <_NA> <_NA> Speaker_id <_NA> SPEAKER session_id 1 0.94 0.70 <_NA> <_NA> Speaker_id <_NA>

However, I am getting the following error. Can you please help me with it:

File "pb_chime5/pb_chime5/database/chime5/rttm.py", line 474, in data audio_path = self._audio_paths[session_id] TypeError: list indices must be integers or slices, not str

Since in your above comment it is mentioned that "it does not generalize to multiple files and each contain a single channel.". I wanted to check with you, if what I did above needs some additional change aswell.

I tried merging the multiple files to get a multichannel file. And got the new json file as follows:

{ 'session_id': '{audio_path}/session_id.wav' }

However, I am getting the same error. File "pb_chime5/pb_chime5/database/chime5/rttm.py", line 474, in data audio_path = self._audio_paths[session_id] TypeError: list indices must be integers or slices, not str

Thanks, Ashish

boeddeker commented 3 years ago

Hi,

thank you for trying to use this code. Getting the input right is actually the most difficult part, because of the guide and maybe multiple files.

With "does not generalize to multiple files" I meant the simple scp files (only example id and file path, without sox etc). Using a json doesn't have this problem and the code should work in both cases. Both of your jsons look correct.

The error you get looks strange. audio_paths should be a dictionary and not a list. Can you inspect the value of self._audio_paths? I pushed some changes to produce a more verbose error. Alternatively you can start the script with pdb (i.e. pb_ime5.scripts.kaldi_run_rttm_libri_css --pdb with ...) and manually inspect the object (or other tricks like print or raise a new exception as in my modification).

When you post the value of audio_paths, I hope that I can better help.