BUTSpeechFIT / VBx

Variational Bayes HMM over x-vectors diarization
251 stars 57 forks source link

Error Handling for the empty/silence audio files #19

Closed vinayentc closed 3 years ago

vinayentc commented 3 years ago

Hello Federico, Good Evening..!!

Currently if we are processing an empty/silent audio file then the *.lab file will be generated as empty and thus the subsequent files will be empty leading to below error. Can you please suggest how can we handle this error efficiently through the code. Exception handling yes can done but wanted to once understand if we can still process such files efficiently in any better way. Thanks, Vinay

Error- CRITICAL - index 0 is out of bounds for axis 1 with size 0 Traceback (most recent call last): File “/services/speech_diarization/service_streaming_unsupervised.py", line 169, in get_streaming_unsupervised predict.predict_embeddings(directory_path_session, combined_audio_file, audio_file_path, File "/services/speech_diarization/unsupervised_diarization/predict_streaming.py", line 158, in predict_embeddings seg = signal[labs[segnum, 0]:labs[segnum, 1]] IndexError: index 0 is out of bounds for axis 1 with size 0 CRITICAL - index 0 is out of bounds for axis 1 with size 0

fnlandini commented 3 years ago

Hello Vinay, I think the best would be to skip the rest of the processing once you run VAD and find that there is no speech. In such case you would simply return an empty rttm file if you need to output that as part of your pipeline. As part of a product (as it seems you plugged VBx in your infrastructure), you might be interested in knowing if it is possible to receive recordings without speech. If you assume there is always speech, then maybe you will need to adjust your VAD to recognize with higher recall but bear in mind that the quality of the diarization output might degrade if you start including too much non-speech.

vinayentc commented 3 years ago

Hello Federico,

Thanks for the response. As it will be about 15-20 sec chunks, there is a possibility of receiving the silence recordings. However I will handle the silence files as suggested by you. Currently therefore closing the issue.

Thanks, Vinay