adding long-form audio speaker diarization

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD 2-Clause "Simplified" License

2.53k stars 243 forks source link

adding long-form audio speaker diarization #125

Closed ALEXuH closed 3 months ago

ALEXuH commented 8 months ago

https://github.com/NVIDIA/NeMo/pull/7737 can fix long audio clustering cuda ouf of memory . Theoretically, the maximum length of audio can be extended by a factor of unit_window_len/sub_cluster_n. For instance, by default, if the original clustering hits the memory limit at the 1-hour mark, the long-form clustering could handle up to 20 hours without exhausting the memory.

MahmoudAshraf97 commented 7 months ago

Sorry for the huge delay, I prefer waiting until this is merged with a stable release of NeMo before merging as NeMo is already problematic enough

Teapack1 commented 4 months ago

This works well for me! I applied this fix in my fork of this repo. This allowed me to download and diarize hundreds of podcast episodes 2-3 hour long on RTX2080.

Thank you!