Closed kli017 closed 2 years ago
Hi, thank you for the interest. That is a very good question, indeed and sorry it was not clear from the code. Your format is almost the same as is expected.
As an example, $SEG_LIST_FILE
contains lines like
100304-f-sre2006 100304-f-sre2006-kacg-A 0.00 2.20
100304-f-sre2006 100304-f-sre2006-kacg-A 2.67 6.09
100304-f-sre2006 100304-f-sre2006-kacg-A 6.57 10.05
100304-f-sre2006 100304-f-sre2006-kacg-A 10.05 10.72
100304-f-sre2006 100304-f-sre2006-kacg-A 10.80 16.27
100304-f-sre2006 100304-f-sre2006-kacg-A 16.52 22.12
100304-f-sre2006 100304-f-sre2006-kacg-A 22.66 25.15
100304-f-sre2006 100304-f-sre2006-kacg-A 25.34 28.86
100304-f-sre2006 100304-f-sre2006-kacg-A 29.05 29.79
100304-f-sre2006 100304-f-sre2006-kacg-A 30.19 33.55
where they represent speaker_id, wav_id, start and end times. So you should only modify your last column.
The logic in that block of code (lines 87 to 108) gathers the segments and generates train and validation lists so you should only adapt lines 87 to 98.
I hope this helps.
Thank you for the help, It's clear now, I will try with my data.
@fnlandini hello, I met some error while prepare my custom simu conversation. For the line 125 in conv_generator.py :
selected_speakers = np.random.choice(speakers, nspks, replace=False)
The speakers is a list, And I got an error:
File "./conv_generator.py", line 127, in
Someone said it might because of the version of Numpy, My version is 1.19.2.
solved by replace line 125 by
index = np.random.choice(len(speakers), nspks, replace=False)
selected_speakers = [speakers[idx] for idx in index]
@fnlandini does it make sense to replace this $SEG_LIST_FILE
format with the loading of utt2spk/spk2utt? the format is strangely similar to segments
and it looks like people are confused by this (I was too as you know). Not sure if I don't see some issues with this, let me know.
@Jamiroquai88 $SEG_LIST_FILE
has the Kaldi segments
format if I'm not mistaken. I'm not sure if I understood the question
In the comment above you said that $SEG_LIST_FILE
has columns: speaker_id, wav_id, start and end times
While segments
file has columns: segment_id, wav_id, start, end
I am just saying, that we don't need to create a new file but rather use utt2spk/spk2utt to map from segment_id to speaker_id.
@Jamiroquai88 You are right, it could be possible to handle that in the code rather than creating an extra file. For the time being, I'll keep it as is but thanks for the suggestion, I'll try to fix it in the future as others might find it strange as well
@fnlandini Hello, For the $SEG_LIST_FILE example you give, does one wav_id only have one speaker in that wav?
@kli017 Yes, it is expected to have one speaker per wav
@fnlandini ok thank you for the quick reply 👍
Hello, interesting work, I want to apply this code on a custom dataset to get the statistcs and generate simulate conversation data. But I have some problem to understand some operation in the code. Because I do not have the SRE and Switchboard dataset. I cannot understand the code from 87 to 108 (processing segment). I noticed that the awk code for each dataset is different, and dont know which one is suitable for my data. I was wondering if you can provide some example files of input. The format of my segment file is like : speaker_id wav_id start_time durationtime aaaa wav1 1.0 2.3