RuntimeError: CUDA out of memory.

Desuka-art commented 1 year ago

I keep getting CUDA out of memory errors and I don't know what to do.

effusiveperiscope commented 1 year ago

How much VRAM do you have?

Desuka-art commented 1 year ago

24 GB of VRAM. I have a 3090. By the way, I have another issue. My CPU is taking the brunt of the training for So-vits, and I don't know how to optimize the speed.

effusiveperiscope commented 1 year ago

How large is the dataset you are using?
Have you tried changing the batch_size in preprocess_flist_config.py (you also need to re-run it)?

Desuka-art commented 1 year ago

My dataset is 10 minutes long. 64 files.

No, I have not. What number should I change the batch_size to?

effusiveperiscope commented 1 year ago

That's odd; you shouldn't be running out of VRAM with a dataset that size, but you could try lowering the batch_size to 8. What OS are you using? Can you post logs/errors from before the OOM?

Also -- does your system have an integrated GPU?

Desuka-art commented 1 year ago

I'm using Windows 10. My CPU is an i7-8700. I believe I may have an integrated GPU? I'm not sure? Intel UHD Graphics 630. What do you think? I have tried redirecting the directory, to no avail. I have no idea what I'm doing wrong.

effusiveperiscope commented 1 year ago

Ok--I thought that you might be using pytorch cpu but then it occurred to me that you probably wouldn't get a CUDA OOM if you were. I'm not too sure what's going on here either.

How are you checking your GPU usage vs. CPU usage? Are you using nvidia-smi?
What is the maximum length of audio file used in your dataset?

Desuka-art commented 1 year ago

I'm checking it using the Task Manager. It says specifically I'm using CUDA.

The maximum length is 10 seconds.

effusiveperiscope commented 1 year ago

Ok.

How about with nvidia-smi?
While 10s is a tad long I wouldn't expect that to be the root cause.
Are you running any other programs that might be consuming GPU memory at the same time?

Desuka-art commented 1 year ago

nvidia smi shows I have CUDA installed. 12.0

No, I am not.

effusiveperiscope commented 1 year ago

Does it report memory usage/GPU utilization while training?
I'm running CUDA 11.7; you may want to try downgrading to that if nothing else works.

AmoArt commented 1 year ago

try to run those code lines in terminal and see what responses it gives you:

python -c "import torch; print(torch.cuda.is_available())" python -c "import torch; print(torch.version.cuda)" python -c "import torch; print(torch.zeros(1).cuda())"

Desuka-art commented 1 year ago

True 11.7 tensor([0.], device='cuda:0')

effusiveperiscope commented 1 year ago

Still OOMing? EDIT--I checked my nvcc --version and it's actually 11.4, sorry

Desuka-art commented 1 year ago

So... do I just... change my cuda version then to 11.4? What would be the command for it?

effusiveperiscope commented 1 year ago

What is your nvcc --version? You would have to uninstall CUDA (I think you can do this through the Control Panel) and replace it with the desired version.

Desuka-art commented 1 year ago

Related question, do I remove the ddp line? Where do I do that? I only have one GPU, a 3090.

effusiveperiscope commented 1 year ago

I only have one GPU as well and I do not have to make any changes to the code to train. If you run nvcc --version in the Command Prompt or PowerShell it should spit out some text about your CUDA version.

effusiveperiscope / so-vits-svc

RuntimeError: CUDA out of memory. #6