NVIDIA / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
672 stars 263 forks source link

[cudnn_conv_layer.cu:54] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered #531

Closed HaoLiuHust closed 5 years ago

HaoLiuHust commented 6 years ago

update to 0.17.1, the issue still there when use multi thread #515

drnikolaev commented 6 years ago

@HaoLiuHust I need to reproduce this. May I have your model please?

HaoLiuHust commented 6 years ago

@drnikolaev it is the mtcnn model, only when run it in multi thread, the error will happen

vbzhe commented 6 years ago

question occur in this lines: ########################################### // NOLINT_NEXT_LINE(whitespace/operators) for (int ig = 0; ig < ws_groups(); ++ig) { CUDA_CHECK(cudaStreamSynchronize(Caffe::thread_stream(ig))); } ############################################

HaoLiuHust commented 6 years ago

@vbzhe you got the same error?

cliffwoolley commented 6 years ago

This error code indicates that some kernel running on the GPU had an out-of-bounds memory access. We would need to run through cuda-memcheck in order to find out which one. That's why @drnikolaev asked for a complete reproducer - the whole prototxt as well as the versions of CUDA and cuDNN that you have installed.

Edit: We would also need to know which GPU you have.

jwnsu commented 5 years ago

Encounter same error in different location: F0922 08:33:07.619784 30061 syncedmem.cpp:28] Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered *** Check failure stack trace: *** @ 0x7fbac55c25cd google::LogMessage::Fail() @ 0x7fbac55c4433 google::LogMessage::SendToLog() @ 0x7fbac55c215b google::LogMessage::Flush() @ 0x7fbac55c4e1e google::LogMessageFatal::~LogMessageFatal() @ 0x7fbac63da6ea caffe::SyncedMemory::FreeHost() @ 0x7fbac63da769 caffe::SyncedMemory::~SyncedMemory() @ 0x7fbac5f6ffbd boost::detail::sp_counted_impl_pd<>::dispose() @ 0x7fbac5f7f93a caffe::Tensor::Reshape() @ 0x7fbac5f6bcfd caffe::Blob::Reshape() @ 0x7fbac62cda31 caffe::InnerProductLayer<>::Reshape() @ 0x7fbac74348ea caffe::Layer<>::Forward() @ 0x7fbac63fde53 caffe::Net::ForwardFromTo() @ 0x7fbac63fdfb7 caffe::Net::Forward() @ 0x7fbac6402355 caffe::Net::ForwardBackward() @ 0x7fbac63f7bc7 caffe::Solver::Step()

cliffwoolley commented 5 years ago

@jwnsu - please see my previous comment https://github.com/NVIDIA/caffe/issues/531#issuecomment-418223977 . Can you provide this information, please?

HaoLiuHust commented 5 years ago

the model is not important, any model can produce it