Questions about generating simulation data

ooobsidian commented 1 year ago

Hello! Thank you for contributing such an excellent code. I am hoping to generate a simulation dataset with VoxCeleb2 and I now have some confusion that I would like to have answered by you. First, I run prepareKaldidata_VoxCeleb2.sh to get Kaldidatadir. Then I run generate_data.sh to generate the simulated data, but I got the following error.

/data/source/EEND_dataprep/v2/VoxCeleb2/config_variables.sh: line 19: VAD: No such file or directory
/data/source/EEND_dataprep/v2/VoxCeleb2/config_variables.sh: line 26: simulated: No such file or directory
/data/source/EEND_dataprep/v2/VoxCeleb2/config_variables.sh: line 27: MUSAN: No such file or directory
/data/source/EEND_dataprep/v2/VoxCeleb2/config_variables.sh: line 29: syntax error near unexpected token `newline'
/data/source/EEND_dataprep/v2/VoxCeleb2/config_variables.sh: line 29: `RTTMS_FILE=<A single rttm file with all segments of DIHARD 3 dev full cts>'

I noticed that I need to configure the environment variables in config_variables.sh, but I don't understand how I should configure the following fields, and how I should get these files.

SEG_FILE=<VAD directory for set>/segments
RIRS_SCP=<simulated rirs directory>/simulated_rirs_16k/data/wav.scp
NOISES_SCP=<MUSAN directory>/data/musan_noise_bg/wav.scp
RTTMS_FILE=<A single rttm file with all segments of DIHARD 3 dev full cts>

Could you please explain the meaning and usage of the path in <>? I look forward to hearing from you, thank you.

fnlandini commented 1 year ago

Hello, SEG_FILE=<VAD directory for set>/segments should point to the segments given by a voice activity detection system that you have to run on the data. For example, you can run this one http://kaldi-asr.org/models/m4

RIRS_SCP=<simulated rirs directory>/simulated_rirs_16k/data/wav.scp should point to the room impulse responses. You can download them from http://www.openslr.org/resources/26/sim_rir_16k.zip

NOISES_SCP=<MUSAN directory>/data/musan_noise_bg/wav.scp should point to the background noises from MUSAN which you can get from https://github.com/hitachi-speech/EEND/blob/master/egs/mini_librispeech/v1/musan_bgnoise.tar.gz

RTTMS_FILE=<A single rttm file with all segments of DIHARD 3 dev full cts> should point to a single file with the ground truth rttm segments from some dataset. We used DIHARD 3 dev full cts but you can use another set if you do not have that data.

I hope this helps.

fnlandini commented 1 year ago

Closing due to inactivity. Feel free to reopen

Jiang-Yidi commented 1 year ago

Hi! Thanks for your excellent work.

But for the similar question, I have run the prepareKaldidata_VoxCeleb2.sh file to generate the kaldi style data. How can we get the segments file?

As your paper mentioned: VoxCeleb2 consists of more than 2400 hours of recordings from more than 6000 speakers speaking mostly English. Originally prepared as a training set for training speaker recognition systems, the recordings are partially annotated. This means that for the speakers of interest, some of their segments are identified. Thus, it is possible to derive speech segments for a given speaker without the need for any VAD system.

Does it mean we don't need the segment file? In this way, how can we run the generate_data.sh successfully?

fnlandini commented 1 year ago

Hi @Jiang-Yidi For the experiments, we still did run a public VAD on the files to obtain the VAD segments for the sake of being sure about them. I cannot share the segments file here because it is (only a bit) larger than 25MB but you can write me an email and I'll send them attached.

SamuelBroughton commented 1 year ago

Hi @fnlandini thanks for your contribution. I have also sent an email about the segments files, thanks in advance!

fnlandini commented 4 months ago

The segments have been uploaded here: https://github.com/BUTSpeechFIT/EEND_dataprep/tree/main/v2/VoxCeleb2/VAD

BUTSpeechFIT / EEND_dataprep

Questions about generating simulation data #2