Closed HaoLiuHust closed 5 years ago
@HaoLiuHust I need to reproduce this. May I have your model please?
@drnikolaev it is the mtcnn model, only when run it in multi thread, the error will happen
question occur in this lines: ########################################### // NOLINT_NEXT_LINE(whitespace/operators) for (int ig = 0; ig < ws_groups(); ++ig) { CUDA_CHECK(cudaStreamSynchronize(Caffe::thread_stream(ig))); } ############################################
@vbzhe you got the same error?
This error code indicates that some kernel running on the GPU had an out-of-bounds memory access. We would need to run through cuda-memcheck
in order to find out which one. That's why @drnikolaev asked for a complete reproducer - the whole prototxt as well as the versions of CUDA and cuDNN that you have installed.
Edit: We would also need to know which GPU you have.
Encounter same error in different location:
F0922 08:33:07.619784 30061 syncedmem.cpp:28] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered *** Check failure stack trace: *** @ 0x7fbac55c25cd google::LogMessage::Fail() @ 0x7fbac55c4433 google::LogMessage::SendToLog() @ 0x7fbac55c215b google::LogMessage::Flush() @ 0x7fbac55c4e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fbac63da6ea caffe::SyncedMemory::FreeHost() @ 0x7fbac63da769 caffe::SyncedMemory::~SyncedMemory() @ 0x7fbac5f6ffbd boost::detail::sp_counted_impl_pd<>::dispose() @ 0x7fbac5f7f93a caffe::Tensor::Reshape() @ 0x7fbac5f6bcfd caffe::Blob::Reshape() @ 0x7fbac62cda31 caffe::InnerProductLayer<>::Reshape() @ 0x7fbac74348ea caffe::Layer<>::Forward() @ 0x7fbac63fde53 caffe::Net::ForwardFromTo() @ 0x7fbac63fdfb7 caffe::Net::Forward() @ 0x7fbac6402355 caffe::Net::ForwardBackward() @ 0x7fbac63f7bc7 caffe::Solver::Step()
@jwnsu - please see my previous comment https://github.com/NVIDIA/caffe/issues/531#issuecomment-418223977 . Can you provide this information, please?
the model is not important, any model can produce it
update to 0.17.1, the issue still there when use multi thread #515