BUTSpeechFIT / VBx

Variational Bayes HMM over x-vectors diarization
251 stars 57 forks source link

inconsistency between code and technical report #68

Closed alephpi closed 6 months ago

alephpi commented 6 months ago

In the report

3.2. Diarization pipeline To perform the diarization, each input recording is first split into speech segments according to the oracle VAD and the segments shorter than 0.1 s are discarded. From these segments, x-vectors are extracted every 0.25 s from overlapping sub-segments of 1.5 s (or less than 1.5 s for the last sub-segments or shorter segments). The x-vectors are centered, whitened and length normalized (Garcia-Romero and Espy-Wilson, 2011) (which is also done for the PLDA training data).

However in predict.py https://github.com/BUTSpeechFIT/VBx/blob/57466e6e245d5cdfe2e88ee6503702ace3ffdd03/VBx/predict.py#L168 i.e segments shorter than 0.01s are discarded

https://github.com/BUTSpeechFIT/VBx/blob/57466e6e245d5cdfe2e88ee6503702ace3ffdd03/VBx/predict.py#L89-L90 i.e. x-vectors are extracted every 0.24 s from overlapping sub-segments of 1.44s

fnlandini commented 6 months ago

Hi, yes, that is correct. However the difference is quite small