Cuda out of Memory Error

omarsiddiqi224 commented 6 months ago

I set up the program to work with Gradio, here is a snippet of the code:

`def transcribe2(audio_file): if audio_file: head, tail = os.path.split(audio_file) path = head

    if tail[-3:] != 'wav':
        subprocess.call(['ffmpeg', '-i', audio_file, "audio.wav", '-y'])
        tail = "audio.wav"

    subprocess.call(['ffmpeg', '-i', audio_file, "audio.wav", '-y'])
    tail = "audio.wav"
    print("before diarize") 
    os.system(f"insanely-fast-whisper --file-name {tail} --hf_token BLANK --flash True")
    #subprocess.run([f"insanely-fast-whisper --file-name {tail} --hf_token BLANK"], shell=True, capture_output=True, text=True)
    print("after diarize")
    #subprocess.run(["python cleanup.py"], shell=True, capture_output=True, text=True)
    #os.system("python cleanup.py")

    text = read_file_content('audio.txt')
    print("after reading")
    fixed = text.strip()
    summarized = summarize(fixed)
    #print("after summary")
    global transcripts
    transcripts = text
    torch.cuda.empty_cache()
    return(text, summarized)`

Once I start it up, and upload an audio file, it works beautifully. however, if I upload a second audio file, it breaks, and gives me the following error:

key_states = torch.cat([past_key_value[0].transpose(1, 2), key_states], dim=1) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacty of 21.99 GiB of which 15.75 MiB is free. Process 140926 has 14.49 GiB memory in use. Including non-PyTorch memory, this process has 7.49 GiB memory in use. Of the allocated memory 6.80 GiB is allocated by PyTorch, and 460.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

If I don't diarize it, I can upload multiple audio files sequentially without any error. However, after around 5 audio files, it breaks. With diarization it breaks on the second upload.

Vaibhavs10 commented 6 months ago

I think the issue is with the --batch-size you can set it it to a lower value --batch-size 2 should work well for example.

Vaibhavs10 commented 6 months ago

(closing this for now, feel free to re-open if that doesn't fix it).

hatimkh20 commented 5 months ago

CUDA out of memory. Tried to allocate 52.00 MiB. GPU 0 has a total capacty of 11.00 GiB of which 4.47 GiB is free. Of the allocated memory 5.22 GiB is allocated by PyTorch, and 257.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am getting this error

iSuslov commented 4 months ago

@Vaibhavs10 having the same issue when using --timestamp word.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacity of 21.96 GiB of which 896.00 KiB is free. Including non-PyTorch memory, this process has 21.95 GiB memory in use. Of the allocated memory 21.14 GiB is allocated by PyTorch, and 591.83 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Happens for batch sizes > 4. Tried on L4 and T4 GPUs

Vaibhavs10 / insanely-fast-whisper

Cuda out of Memory Error #140