https://github.com/NVIDIA/NeMo/pull/7737 can fix long audio clustering cuda ouf of memory .
Theoretically, the maximum length of audio can be extended by a factor of unit_window_len/sub_cluster_n. For instance, by default, if the original clustering hits the memory limit at the 1-hour mark, the long-form clustering could handle up to 20 hours without exhausting the memory.
This works well for me! I applied this fix in my fork of this repo.
This allowed me to download and diarize hundreds of podcast episodes 2-3 hour long on RTX2080.
https://github.com/NVIDIA/NeMo/pull/7737 can fix long audio clustering cuda ouf of memory . Theoretically, the maximum length of audio can be extended by a factor of unit_window_len/sub_cluster_n. For instance, by default, if the original clustering hits the memory limit at the 1-hour mark, the long-form clustering could handle up to 20 hours without exhausting the memory.