CUDA runs out of memory without Gradio, but there is no error if Gradio is used.

I am running audiocraft in AWS G4dn.xlarge with NVDA T4.

I have no error using Gradio with model 'facebook/musicgen-large', duration 30s https://github.com/facebookresearch/audiocraft/blob/main/demos/musicgen_app.py

INFO:audiocraft.modules.conditioners:T5 will be evaluated with autocast as float32

but if I run in python with this

import torch
import gc

from audiocraft.data.audio_utils import convert_audio
from audiocraft.data.audio import audio_write
from audiocraft.models.encodec import InterleaveStereoCompressionModel
from audiocraft.models import MusicGen, MultiBandDiffusion
MODEL = None  # Last used model

def clear_gpu_memory():
    torch.cuda.empty_cache()
    gc.collect()

def load_model(version='facebook/musicgen-large'):
    global MODEL
    print("Loading model", version)
    if MODEL is None or MODEL.name != version:
        # Clear PyTorch CUDA cache and delete model
        del MODEL
        torch.cuda.empty_cache()
        MODEL = None  # in case loading would crash
        MODEL = MusicGen.get_pretrained(version)

  USE_DIFFUSION = False
  progress = True
  texts = "cafe music"
  clear_gpu_memory()
  load_model('facebook/musicgen-large')
  MODEL.set_generation_params(duration=30, top_k=int(250), top_p=0,temperature=1, cfg_coef=3)
  outputs = MODEL.generate(texts, progress=progress, return_tokens=USE_DIFFUSION)
  outputs = outputs.detach().cpu().float()
  audio_write('a.wav', outputs[0], MODEL.sample_rate, strategy="loudness",
                  loudness_headroom_db=16, loudness_compressor=True, add_suffix=False)

then I got

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB. GPU 0 has a total capacty of 14.57 GiB of which 2.75 MiB is free. Process 15403 has 7.45 GiB memory in use. Including non-PyTorch memory, this process has 7.11 GiB memory in use. Of the allocated memory 6.79 GiB is allocated by PyTorch, and 193.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

so what is the different between with and without Gradio?
and how to fix this? Thanks all.

facebookresearch / audiocraft

CUDA runs out of memory without Gradio, but there is no error if Gradio is used. #469