BUTSpeechFIT / VBx

Variational Bayes HMM over x-vectors diarization
252 stars 57 forks source link

VBx written in C++ #31

Closed xuanjihe closed 3 years ago

xuanjihe commented 3 years ago

Hello, I would like to ask if there is a VBx algorithm written in C + +

videodanchik commented 3 years ago

I bet It won't be much faster than the current version in Python.

fnlandini commented 3 years ago

Hello @xuanjihe No, we are not aware of any implementation in C++

xuanjihe commented 3 years ago

Hello @xuanjihe No, we are not aware of any implementation in C++

well, I want to deploy VB in my project, but I don't know whether the real-time rate can meet the requirements of going online, so I want to ask if there is a VBx algorithm written in C++

xuanjihe commented 3 years ago

By the way, I would like to ask what is the computational complexity of the VBx, is it O(TN^2)?

fnlandini commented 3 years ago

Hello @xuanjihe, Apologies for the delay. I will reply both questions together. In order to run the code in real time for relatively short audios, the bottleneck will be on the x-vector computation. If that is handled with a CPU, it can take some time and definitely more than VBx. With GPU, the whole recipe should definitely run in less than real-time speed. If the recording is longer (around 30 minutes or longer), the bottleneck can be the initialization step (based on AHC). This step is cubic on the length of the sequence so for very long recordings, it becomes too slow. Some other issues (https://github.com/BUTSpeechFIT/VBx/issues/17 and https://github.com/BUTSpeechFIT/VBx/issues/16) have pointed out this and a way of avoiding the AHC step is to initialize the clusters randomly. This can yield a bit worse diarization performance but it will run faster. The VB-HMM part of the algorithm has the complexity you mention and this is definitely the fastest part of the whole recipe so it should run in faster than real-time.

I hope this helps. Federico