Open manogna-s opened 5 years ago
The second stage of training with resolution 768x768 is failing throwing the following error:
F0903 14:31:26.106397 92421 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7fd08aa5c5cd google::LogMessage::Fail() @ 0x7fd08aa5e433 google::LogMessage::SendToLog() @ 0x7fd08aa5c15b google::LogMessage::Flush() @ 0x7fd08aa5ee1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fd08b2290e0 caffe::SyncedMemory::to_gpu() @ 0x7fd08b2280a9 caffe::SyncedMemory::mutable_gpu_data() @ 0x7fd08b390282 caffe::Blob<>::mutable_gpu_data() @ 0x7fd08b363928 caffe::BaseConvolutionLayer<>::forward_gpu_gemm() @ 0x7fd08b3eb296 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x7fd08b1f15f2 caffe::Net<>::ForwardFromTo() @ 0x7fd08b1f1717 caffe::Net<>::Forward() @ 0x7fd08b3a6eca caffe::Solver<>::Solve() @ 0x7fd08b226604 caffe::P2PSync<>::Run() @ 0x40ada0 train() @ 0x407590 main @ 0x7fd0899cc830 __libc_start_main @ 0x407db9 _start @ (nil) (unknown) Aborted (core dumped)
Anyone came cross this error and found a fix for this?
The second stage of training with resolution 768x768 is failing throwing the following error:
F0903 14:31:26.106397 92421 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory Check failure stack trace: @ 0x7fd08aa5c5cd google::LogMessage::Fail() @ 0x7fd08aa5e433 google::LogMessage::SendToLog() @ 0x7fd08aa5c15b google::LogMessage::Flush() @ 0x7fd08aa5ee1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fd08b2290e0 caffe::SyncedMemory::to_gpu() @ 0x7fd08b2280a9 caffe::SyncedMemory::mutable_gpu_data() @ 0x7fd08b390282 caffe::Blob<>::mutable_gpu_data() @ 0x7fd08b363928 caffe::BaseConvolutionLayer<>::forward_gpu_gemm() @ 0x7fd08b3eb296 caffe::ConvolutionLayer<>::Forward_gpu() @ 0x7fd08b1f15f2 caffe::Net<>::ForwardFromTo() @ 0x7fd08b1f1717 caffe::Net<>::Forward() @ 0x7fd08b3a6eca caffe::Solver<>::Solve() @ 0x7fd08b226604 caffe::P2PSync<>::Run() @ 0x40ada0 train() @ 0x407590 main @ 0x7fd0899cc830 __libc_start_main @ 0x407db9 _start @ (nil) (unknown) Aborted (core dumped)
Anyone came cross this error and found a fix for this?