CUDA error: out of memory (PyTorch) - Githubissues

MVIG-SJTU / AlphaPose

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

http://mvig.org/research/alphapose.html

Other

8.02k stars 1.97k forks source link

CUDA error: out of memory (PyTorch) #94

Closed thomaswienecke closed 6 years ago

thomaswienecke commented 6 years ago

Hi there,

i had problems installing the lua version, so I switched over to the pytorch version.

My new Problem is, that I get an error CUDA error: out of memory when executing the demo.py on my own images. When I run the webcam-demo or with less pictures I don't get any errors and it works fine. I'm using a GTX 1080 with 8gb vram.

Could anyone of you help me?

Regards, Thomas

Stacktrace:

Traceback (most recent call last):
  File "demo.py", line 83, in <module>
    hm = pose_model(inps)
  File "/home/thomas/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/thomas/Documents/AlphaPose/SPPE/src/main_fast_inference.py", line 67, in forward
    out = self.pyranet(x)
  File "/home/thomas/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/thomas/Documents/AlphaPose/SPPE/src/models/FastPose.py", line 32, in forward
    out = self.duc2(out)
  File "/home/thomas/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/thomas/Documents/AlphaPose/SPPE/src/models/layers/DUC.py", line 19, in forward
    x = self.conv(x)
  File "/home/thomas/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/thomas/.local/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: out of memory

Fang-Haoshu commented 6 years ago

Can you show an example image? How many people are there in it?

thomaswienecke commented 6 years ago

Actually I'm not allowed to show example images because of my NDA, sorry!

There are between 5 and 147 People in an image, separated in two categories, some pictures with around 6 people and some pictures with around 100 people.

I now tested it without the 100-people-pictures and it worked. Is there a chance to make this work for more people?

Fang-Haoshu commented 6 years ago

Oh I see. Let me think of it.

thomaswienecke commented 6 years ago

Okay. Thanks!

P.s.: Would there be a way to test if it crashes on YOLO or SPPE?

Fang-Haoshu commented 6 years ago

It crashes on SPPE since we put all detected person in one batch for the SPPE network. You can try to add a for loop there and limit the batch size.

thomaswienecke commented 6 years ago

Can you give me a hint on how to do this? I tried to split the tensor before calling hm = pose_model(inps) but i didn't succeed.

Fang-Haoshu commented 6 years ago

Hi, I think it should work, I will look into it today

Fang-Haoshu commented 6 years ago

            batchSize = 30
            datalen = inps.size(0)
            leftover = 0
            if (datalen) % batchSize:
                leftover = 1
            num_batches = datalen // batchSize + leftover
            hm = []
            for j in range(num_batches):
                inps_j = Variable(inps[j*batchSize:min((j +  1)*batchSize, datalen)].cuda())
                hm_j = pose_model(inps_j)
                hm.append(hm_j)
            hm = torch.cat(hm)

Hi, this works for me. Please take a try. I will add that in next commit.

thomaswienecke commented 6 years ago

Thanks for the update. As far as i can see it works perfectly fine! :+1: Only had to reduce the batch size to 10 and turn on --vis_fast, but I think that's just depending on hardware.

Fang-Haoshu commented 6 years ago

It seems 10 is too small. Can you please check if there is a memory leak in the code? In next version, I fix the problem of memory leaks and the batch can be 80 for a 1080Ti card

Fang-Haoshu commented 6 years ago

@thomasdissert Hi, can you try the new version to see if it works better?

Fang-Haoshu commented 6 years ago

And also, can you help testing the best params of detbatch and posebatch on a 8GB cards? I have only 1080Ti card and can not test it. Later I will post the params in the readme so the others can follow the guidelines to best utilize the GPU memory

husnejahan commented 5 years ago

I am getting below error: RuntimeError: CUDA error: out of memory How can I check if there is a memory leak in the code?