MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
BSD 2-Clause "Simplified" License
3.28k stars 272 forks source link

NeMo OutOfMemoryError: CUDA out of memory. #76

Closed TheGermanEngie closed 1 year ago

TheGermanEngie commented 1 year ago

Great updates to the Colab - however, with the free version using the T4, the NeMo MSDD diarization model fails to cluster when I try both GPU fp16 and int8:

creating speech segments: 100%|██████████| 1/1 [00:15<00:00, 15.62s/it]

[NeMo I 2023-08-13 18:17:55 clustering_diarizer:287] Subsegmentation for embedding extraction: scale0, /content/temp_outputs/speaker_outputs/subsegments_scale0.json [NeMo I 2023-08-13 18:17:56 clustering_diarizer:343] Extracting embeddings for Diarization [NeMo I 2023-08-13 18:17:56 collections:298] Filtered duration for loading collection is 0.00 hours. [NeMo I 2023-08-13 18:17:56 collections:299] Dataset loaded with 6287 items, total duration of 2.05 hours. [NeMo I 2023-08-13 18:17:56 collections:301] # 6287 files loaded accounting to # 1 labels

[1/5] extract embeddings: 100%|██████████| 99/99 [06:04<00:00, 3.69s/it]

[NeMo I 2023-08-13 18:24:02 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings [NeMo I 2023-08-13 18:24:02 clustering_diarizer:287] Subsegmentation for embedding extraction: scale1, /content/temp_outputs/speaker_outputs/subsegments_scale1.json [NeMo I 2023-08-13 18:24:02 clustering_diarizer:343] Extracting embeddings for Diarization [NeMo I 2023-08-13 18:24:02 collections:298] Filtered duration for loading collection is 0.00 hours. [NeMo I 2023-08-13 18:24:02 collections:299] Dataset loaded with 7556 items, total duration of 2.17 hours. [NeMo I 2023-08-13 18:24:02 collections:301] # 7556 files loaded accounting to # 1 labels

[2/5] extract embeddings: 100%|██████████| 119/119 [06:37<00:00, 3.34s/it]

[NeMo I 2023-08-13 18:30:42 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings [NeMo I 2023-08-13 18:30:42 clustering_diarizer:287] Subsegmentation for embedding extraction: scale2, /content/temp_outputs/speaker_outputs/subsegments_scale2.json [NeMo I 2023-08-13 18:30:42 clustering_diarizer:343] Extracting embeddings for Diarization [NeMo I 2023-08-13 18:30:42 collections:298] Filtered duration for loading collection is 0.00 hours. [NeMo I 2023-08-13 18:30:42 collections:299] Dataset loaded with 9319 items, total duration of 2.26 hours. [NeMo I 2023-08-13 18:30:42 collections:301] # 9319 files loaded accounting to # 1 labels

[3/5] extract embeddings: 100%|██████████| 146/146 [07:19<00:00, 3.01s/it]

[NeMo I 2023-08-13 18:38:04 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings [NeMo I 2023-08-13 18:38:04 clustering_diarizer:287] Subsegmentation for embedding extraction: scale3, /content/temp_outputs/speaker_outputs/subsegments_scale3.json [NeMo I 2023-08-13 18:38:05 clustering_diarizer:343] Extracting embeddings for Diarization [NeMo I 2023-08-13 18:38:05 collections:298] Filtered duration for loading collection is 0.00 hours. [NeMo I 2023-08-13 18:38:05 collections:299] Dataset loaded with 12482 items, total duration of 2.38 hours. [NeMo I 2023-08-13 18:38:05 collections:301] # 12482 files loaded accounting to # 1 labels

[4/5] extract embeddings: 100%|██████████| 196/196 [07:53<00:00, 2.42s/it]

[NeMo I 2023-08-13 18:46:05 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings [NeMo I 2023-08-13 18:46:05 clustering_diarizer:287] Subsegmentation for embedding extraction: scale4, /content/temp_outputs/speaker_outputs/subsegments_scale4.json [NeMo I 2023-08-13 18:46:05 clustering_diarizer:343] Extracting embeddings for Diarization [NeMo I 2023-08-13 18:46:05 collections:298] Filtered duration for loading collection is 0.00 hours. [NeMo I 2023-08-13 18:46:05 collections:299] Dataset loaded with 19112 items, total duration of 2.52 hours. [NeMo I 2023-08-13 18:46:05 collections:301] # 19112 files loaded accounting to # 1 labels

[5/5] extract embeddings: 100%|██████████| 299/299 [08:36<00:00, 1.73s/it]

[NeMo I 2023-08-13 18:54:55 clustering_diarizer:389] Saved embedding files to /content/temp_outputs/speaker_outputs/embeddings

clustering: 0%| | 0/1 [00:48<?, ?it/s]


OutOfMemoryError Traceback (most recent call last)

in <cell line: 3>() 1 # Initialize NeMo MSDD diarization model 2 msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda") ----> 3 msdd_model.diarize() 4 5 del msdd_model

8 frames

/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/parts/utils/offline_clustering.py in getKneighborsConnections(affinity_mat, p_value, mask_method) 325 dim = affinity_mat.shape 326 binarized_affinity_mat = torch.zeros_like(affinity_mat).half() --> 327 sorted_matrix = torch.argsort(affinity_mat, dim=1, descending=True)[:, :p_value] 328 binarized_affinity_mat[sorted_matrix.T, torch.arange(affinity_mat.shape[0])] = ( 329 torch.ones(1).to(affinity_mat.device).half()

OutOfMemoryError: CUDA out of memory. Tried to allocate 4.08 GiB (GPU 0; 14.75 GiB total capacity; 9.05 GiB already allocated; 4.06 GiB free; 9.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

TheGermanEngie commented 1 year ago

I suppose I'm just using too big of an audio source file for the free version of colab. My error.