NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.48k stars 2.4k forks source link

Question: Difference of paths2audio_files param and path in manifest file for speaker diarization with ClusteringDiarizer #9789

Open rugrill opened 1 month ago

rugrill commented 1 month ago

Hello, First I used the ClusteringDiarizer just like shown in the provided tutorial https://github.com/NVIDIA/NeMo/blob/main/tutorials/speaker_tasks/Speaker_Diarization_Inference.ipynb. I provided the path to my audio file in the manifest file with the audio_filepath key. The file was getting diarized with a good result.

from nemo.collections.asr.models import ClusteringDiarizer
config = OmegaConf.load(path/to/config)
sd_model = ClusteringDiarizer(cfg=config)
sd_model.diarize()

However I just tried to use the .diarize() method with the paths2audio_files functionality to provide the file path to my file directly as a parameter in the diarize method.

paths = [„path/to/audio/file/1“]
sd_model.diarize(paths)

The results of the same files diarized with the paths2audio_files functionality are significantly worse than diarizing them, providing the path in the manifest file. Sadly I can’t find much documentation on what the differences between the two possibilities are in terms of the actual speaker diarization.

Furthermore I noticed when providing more than one file to the paths2audio_files, the computation time when diarizing the files is much faster compared to manually diarizing them after each other and changing the path in the manifest file.

So the question is what differences are there between the two approaches, and what influences the result of the diarization of the same audio file?

nesibe28 commented 1 month ago

Hei, I have the same issue! Im getting frustrated

nithinraok commented 1 month ago

Thanks for bringing this. Looks like the path followed when providing audio_files is being diverged compared to manifests.

@weiqingw4ng pls have a look at this issue.