GPU runs out of memory with --batch_num=32

LiJunnan1992 / MLNT

Meta-Learning based Noise-Tolerant Training

123 stars 29 forks source link

GPU runs out of memory with --batch_num=32 #12

Open mehrdadshoeiby opened 4 years ago

mehrdadshoeiby commented 4 years ago

Could you please help me out?

When I use --batch_num=32, I cannot run the code on a single GPU. My GPU is Tesla P100-SXM2 16G. I only can run the code with --batch_num=1. Since we have create_graph=True and retain_graph=True in the inner loop with torch.autograd.grad, and M=10, the GPU memory gets allocated too fast. It is also very tricky to run MAML on multiple GPUs.

I am wondering in your implementation, what was the batch size? and how did you address the problem of increasing GPU memory allocation in the inner for loop? Did you use multiple GPUs or a single GPU?

Thanks alot

LiJunnan1992 commented 4 years ago

Hi,

The code is not compatible with the latest PyTorch. I'll try to fix this issue. Thanks!

mehrdadshoeiby commented 4 years ago

Thanks, What is the Pytorch version of the current implementation?

LiJunnan1992 commented 4 years ago

It should be able to run on Pytorch V0.4

mehrdadshoeiby commented 4 years ago

Thank you very much, it is working now. However, it only works with --batch_size=12(not 32) and --num_fast=5(not 10). Could you please let me know what GPU did you use?

mehrdadshoeiby commented 4 years ago

Also, in your opinion, what do you think is the reason for memory issues with python > v.1? Maybe I can do something myself if you could give me a hint? Thanks again.

LiJunnan1992 commented 4 years ago

I used TitanX with 12G memory. "Variable" and "volatile" are deprecated in new pytorch versions.

6-git commented 1 year ago

Could you please help me out?

When I use --batch_num=32, I cannot run the code on a single GPU. My GPU is Tesla P100-SXM2 16G. I only can run the code with --batch_num=1. Since we have create_graph=True and retain_graph=True in the inner loop with torch.autograd.grad, and M=10, the GPU memory gets allocated too fast. It is also very tricky to run MAML on multiple GPUs.

I am wondering in your implementation, what was the batch size? and how did you address the problem of increasing GPU memory allocation in the inner for loop? Did you use multiple GPUs or a single GPU?

Thanks alot

Hello, did you solve the problem of "CUDA out of memory" caused by “batch_size=32“. I also have this issue when I ran the main.py. My GPU is NVIDIA GeForce 3080 24G.

What could i do to solve it? Thanks for your reply!

6-git commented 1 year ago

I used TitanX with 12G memory. "Variable" and "volatile" are deprecated in new pytorch versions. Hi, thanks for the nice implementation. when I ran the main.py , the batch_size is 32, I met the problem of "CUDA out of memory". My GPU is NVIDIA GeForce 3080 24G.

What could i do to solve it? Thanks for your reply!