VisionLearningGroup / R-C3D

code for R-C3D
MIT License
254 stars 94 forks source link

out of memory! Could you please tell me your GPU card type? #33

Open sijun-zhou opened 6 years ago

sijun-zhou commented 6 years ago

Hi, Huijuan @huijuan88
I am using a card of 1080Ti with 11G memory, but 2.5G was used by other students, so I was only left with 8.5G memory with GPU. But when I run the test script in ActivityNet with your provided script, only loaded one 1 video's frams(768 images), but out of memory at the step: blobs_out = net.forward(forward_kwargs) """ F0713 15:08:15.452706 22317 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory ** Check failure stack trace: Aborted (core dumped) """

so could you plz tell me what is your GPU type and how many GPUs have you used when testing and training this code? Thanks in advance!

sijun-zhou commented 6 years ago

I reduce the 768 images to 160 images. It is working fine with me with 8.5G memory left. But if I use 768 images nearly 5 times larger. So I guess I need 40G to 50G GPU memories. And it is difficult to run on pycaffe with multiple GPUs. Could you plz help me! I am a new to action detection. Really appreciated!

YanYan0716 commented 6 years ago

@sijun-zhou hello, I have meet the same problem, do you solved it? and i an also a new about the action detection, thanks a lot

sijun-zhou commented 6 years ago

@yanqian123 I used 1080 Ti *1. My problem solved when I open CUDNN for the project.

YanYan0716 commented 6 years ago

about Makefile.config CUDNN==1 ? right?? thanks again

YanYan0716 commented 6 years ago

@sijun-zhou about Makefile.config CUDNN==1 ? right?? thanks again

sijun-zhou commented 6 years ago

@yanqian123 yes

YanYan0716 commented 6 years ago

i am sorry to say,it did not work, my gpu is 1050, but when i set CUDNN==1, i could not solve my problem. could you give me some advice?

YanYan0716 commented 6 years ago

@sijun-zhou thank you again

sijun-zhou commented 6 years ago

@yanqian123 As far as i am remember, if you do not change batch size(700+? i don't remember it clearly). It will consume approximate 5-6G GPU memory. It obvious that 1050 cannot support it.

viswalal commented 5 years ago

I am getting the same out of memory error while testing.

F1115 21:40:12.954958 25933 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace:

Any way to handle this like reducing batch size or number of frames? GPU - GeForce 940MX ( 4 GB Only)

Xchangjiang commented 5 years ago

@viswalal hello, I have meet the same problem, I think it's due to a mismatch in the number of GPUs. I only have one GPU, but it is 'GPU_ID: 1', it should be 'GPU_ID: 0' , but I can‘t find the config file, do you solved it?

viswalal commented 5 years ago

@Xchangjiang Hi, I also have only one GPU. For me, GPU ID is coming as 0 in log while running script_test.sh. I am not able to resolve it. While running the test, I have checked GPU usage. It is increasing and getting crashed when memory is full. I am not able to reduce the batch size. Actually not able to identify where to change it.

huijuan88 commented 5 years ago

You can change GPU_ID in the file “script_train.sh”.

On Nov 26, 2018, at 09:56, viswalal notifications@github.com<mailto:notifications@github.com> wrote:

@Xchangjianghttps://github.com/Xchangjiang Hi, I also have only one GPU. For me, GPU ID is coming as 0 in log while running script_test.sh. I am not able to resolve it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/33#issuecomment-441734911, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_0qtzigzXEJEyEF5p_v1e2ykKUIdks5uzCs4gaJpZM4VOfTR.

viswalal commented 5 years ago

@huijuan88 , Hello.. I think the GPU ID 0 is correct for me. Since my GPU is only 4 GB it is getting crashed. I want to change the batch size for running script_test.sh like we set 'batch_size' in the network definition prototxt file. ( or maybe reducing the number of frames it loads at a time will help).

huijuan88 commented 5 years ago

You can change the buffer size in “td_cnn_end2end.yml”. LENGTH: [768]

You also need to change the data process file to make everything consistent.

On Nov 26, 2018, at 22:49, viswalal notifications@github.com<mailto:notifications@github.com> wrote:

@huijuan88https://github.com/huijuan88 , Hello.. I think the GPU ID 0 is correct for me. Since my GPU is only 4 GB it is getting crashed. I want to change the batch size for running script_test.sh like we set 'batch_size' in the network definition prototxt file. ( or maybe reducing the number of frames it loads at a time will help).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/33#issuecomment-441947750, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_5oHAfLruY8s2PWtSnuajw7B6o_tks5uzOCTgaJpZM4VOfTR.

viswalal commented 5 years ago

@huijuan88 thank you.. I will try that

viswalal commented 5 years ago

@huijuan88 hi, I have tried with length=256,128,64 and 32 and changed the data generation also (by editing generate_roidb_512.py and running the same) still getting the same error. I am stuck at this point.

huijuan88 commented 5 years ago

The error is about memory. But it should fit for such small length, e.g. 32.

On Nov 27, 2018, at 22:36, viswalal notifications@github.com<mailto:notifications@github.com> wrote:

@huijuan88https://github.com/huijuan88 hi, I have tried with length=256,128,64 and 32 and changed the data generation also (by editng generate_roidb_512.py and running the same) still getting the same error. I am stuck at this point.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/VisionLearningGroup/R-C3D/issues/33#issuecomment-442336541, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFOa_y0Tq_bPF3w6Av8yQRcXfIaXN1A5ks5uzi78gaJpZM4VOfTR.

mxguo commented 5 years ago

@huijuan88 hi, I have tried with length=256,128,64 and 32 and changed the data generation also (by editing generate_roidb_512.py and running the same) still getting the same error. I am stuck at this point.

@viswalal hi, I also meet this problem, and the error is still there although I tried with length=256,128,64,32 and 16 in the generate_roidb_512.py, Have you solved this problem? Really appreciated!