Check failed: error == cudaSuccess (8 vs. 0) invalid device function, out of memory

yghlc commented 7 years ago

I run the codes on Ubuntu 16.04, and caffe has GPU support (Quadro M4000, 8GB ), but one error occurs, anyone has ideas what's going on? how can I solve this problem? Many thanks for your help.

I0409 20:53:47.976828 21105 net.cpp:170] pool2 needs backward computation. I0409 20:53:47.976835 21105 net.cpp:170] relu2_2 needs backward computation. I0409 20:53:47.976843 21105 net.cpp:170] conv2_2 needs backward computation. I0409 20:53:47.976853 21105 net.cpp:170] relu2_1 needs backward computation. I0409 20:53:47.976861 21105 net.cpp:170] conv2_1 needs backward computation. I0409 20:53:47.976869 21105 net.cpp:170] pool1 needs backward computation. I0409 20:53:47.976878 21105 net.cpp:170] relu1_2 needs backward computation. I0409 20:53:47.976887 21105 net.cpp:170] conv1_2 needs backward computation. I0409 20:53:47.976897 21105 net.cpp:170] relu1_1 needs backward computation. I0409 20:53:47.976907 21105 net.cpp:170] conv1_1 needs backward computation. I0409 20:53:47.976915 21105 net.cpp:172] data does not need backward computation. I0409 20:53:47.976925 21105 net.cpp:208] This network produces output accuracy I0409 20:53:47.976969 21105 net.cpp:467] Collecting Learning Rate and Weight Decay. I0409 20:53:47.976986 21105 net.cpp:219] Network initialization done. I0409 20:53:47.976995 21105 net.cpp:220] Memory required for data: 9170170576 I0409 20:53:47.977139 21105 solver.cpp:41] Solver scaffolding done. I0409 20:53:47.977152 21105 caffe.cpp:118] Finetuning from voc12/model/vgg128_noup/init.caffemodel I0409 20:53:48.125612 21105 net.cpp:740] Target layer fc6 not initialized. I0409 20:53:48.125658 21105 net.cpp:740] Target layer fc7 not initialized. I0409 20:53:48.125666 21105 net.cpp:740] Target layer fc8_voc12 not initialized. I0409 20:53:48.127007 21105 solver.cpp:160] Solving vgg128_noup I0409 20:53:48.127018 21105 solver.cpp:161] Learning Rate Policy: step F0409 20:53:48.199013 21105 im2col.cu:68] Check failed: error == cudaSuccess (8 vs. 0) invalid device function Check failure stack trace: @ 0x7fd71f3715cd google::LogMessage::Fail() @ 0x7fd71f373433 google::LogMessage::SendToLog() @ 0x7fd71f37115b google::LogMessage::Flush() @ 0x7fd71f373e1e google::LogMessageFatal::~LogMessageFatal() @ 0x5c878a caffe::im2col_gpu<>() @ 0x5bd086 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x55aaba caffe::Net<>::ForwardFromTo() @ 0x55aca7 caffe::Net<>::ForwardPrefilled() @ 0x5aef94 caffe::Solver<>::Solve() @ 0x41be0a train() @ 0x415708 main @ 0x7fd71c112830 __libc_start_main @ 0x41a769 _start @ (nil) (unknown) Aborted (core dumped)

yghlc commented 7 years ago

Because my GPU is Quadro M4000, and I installed CUDA=8.0, so I changed the GPU setting in Makefile.config :

"Check failed: error == cudaSuccess (8 vs. 0) invalid device function " disappeared, but I had a new error: Check failed: error == cudaSuccess (2 vs. 0) out of memory the error message: I0409 21:17:24.408285 31319 net.cpp:219] Network initialization done. I0409 21:17:24.408291 31319 net.cpp:220] Memory required for data: 9170170576 I0409 21:17:24.408380 31319 solver.cpp:41] Solver scaffolding done. I0409 21:17:24.408390 31319 caffe.cpp:118] Finetuning from voc12/model/vgg128_noup/init.caffemodel I0409 21:17:24.558290 31319 net.cpp:740] Target layer fc6 not initialized. I0409 21:17:24.558336 31319 net.cpp:740] Target layer fc7 not initialized. I0409 21:17:24.558352 31319 net.cpp:740] Target layer fc8_voc12 not initialized. I0409 21:17:24.559871 31319 solver.cpp:160] Solving vgg128_noup I0409 21:17:24.559886 31319 solver.cpp:161] Learning Rate Policy: step F0409 21:17:26.948681 31319 syncedmem.cpp:51] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7f7ecfa295cd google::LogMessage::Fail() @ 0x7f7ecfa2b433 google::LogMessage::SendToLog() @ 0x7f7ecfa2915b google::LogMessage::Flush() @ 0x7f7ecfa2be1e google::LogMessageFatal::~LogMessageFatal() @ 0x532312 caffe::SyncedMemory::mutable_gpu_data() @ 0x53d1ea caffe::Blob<>::mutable_gpu_diff() @ 0x5bd4bd caffe::ConvolutionLayer<>::Backward_gpu() @ 0x55b0c3 caffe::Net<>::BackwardFromTo() @ 0x5aef9c caffe::Solver<>::Solve() @ 0x41be0a train() @ 0x415708 main @ 0x7f7ecc7ca830 __libc_start_main @ 0x41a769 _start @ (nil) (unknown) Aborted (core dumped)

It said out of memory, and I have two cores of GPU, each has 8 GB. The model I run is vgg128_noup.

Felt sad, I thought 8GB memory is enough for most case, but I failed in the default case in this github project.

yytzjgsu commented 7 years ago

just change the bacthsize to a lower value

yghlc commented 7 years ago

@yytzjgsu many thanks! I changed the batchsize to 10 and it works. The running deeplap use around 5G memory of my GPU. The original batchsize is 20 or 30.

TheLegendAli / DeepLab-Context

Check failed: error == cudaSuccess (8 vs. 0) invalid device function, out of memory #28