BUTSpeechFIT / EEND_dataprep

49 stars 7 forks source link

Confusion about the Silence in Simulated Data #6

Closed XiaoLin-Jiao closed 10 months ago

XiaoLin-Jiao commented 10 months ago

Thank you for your open-source code.

I use callhome1_ spk2 (overlap rate about 13.5%, silence rate about 10%) to obtain statistical data, and v1 recipe is used to generate simulated data. The overlap rate in the simulated data I obtained is about 14%, but silence accounts for about 30%. I want to know if this is a normal situation?

diff_spk_overlap.txt diff_spk_pause.txt diff_spk_pause_vs_overlap.txt newspk_samespk_pause_distribution_overlap_distribution.txt overlaps_info.txt same_spk_pause.txt

fnlandini commented 10 months ago

Hi, that is indeed too high percentage of silence. If I compare the stats you have with mine, for same_spk_pause.txt you have a distribution much more skewed towards higher values. Your diff_spk_pause.txt has very few samples and the distribution is also skewed towards longer pauses. This probably explains why you get so much silence. There must be some mismatch between our rttms. I share here the stats for Callhome Part 1 and Callhome Part 1, 2 speakers that I calculated. Could you try using them? I expect the percentage of silence will be different with these ones.

stats_ch1.zip stats_ch1_2spk.zip

XiaoLin-Jiao commented 10 months ago

Hi, that is indeed too high percentage of silence. If I compare the stats you have with mine, for same_spk_pause.txt you have a distribution much more skewed towards higher values. Your diff_spk_pause.txt has very few samples and the distribution is also skewed towards longer pauses. This probably explains why you get so much silence. There must be some mismatch between our rttms. I share here the stats for Callhome Part 1 and Callhome Part 1, 2 speakers that I calculated. Could you try using them? I expect the percentage of silence will be different with these ones.

stats_ch1.zip stats_ch1_2spk.zip

Thanks for your reply. I find the case of the problems. The reason is that the RTTMs I obtained was not arranged in start_time order but in speaker order, which resulted in errors in the calculated statistical information. After reordering, I got the same result as you.

fnlandini commented 10 months ago

Thank you for the feedback and glad it works. If you think the issue is solved, feel free to close it.