dmlc / gluon-cv

Gluon CV Toolkit
http://gluon-cv.mxnet.io
Apache License 2.0
5.83k stars 1.22k forks source link

CUDA Error:src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error [19:58:06] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error #1145

Closed sy1019 closed 3 years ago

sy1019 commented 4 years ago

[19:58:06] src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error [19:58:06] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error Stack trace: [bt] (0) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6ccefb) [0x7fbdf3210efb] [bt] (1) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38ef552) [0x7fbdf6433552] [bt] (2) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3912c7e) [0x7fbdf6456c7e] [bt] (3) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3905641) [0x7fbdf6449641] [bt] (4) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38fb1d1) [0x7fbdf643f1d1] [bt] (5) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38fc124) [0x7fbdf6440124] [bt] (6) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Chunk::~Chunk()+0x3c2) [0x7fbdf666c9a2] [bt] (7) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6d06aa) [0x7fbdf32146aa] [bt] (8) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayFree+0x54) [0x7fbdf63a97d4]


When the above error is displayed during training, but the training is not interrupted, how can I avoid this error (warning)?

williamzhao95 commented 4 years ago

I meet the same problem!

waflessnet commented 4 years ago

idem.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

AnirudhDagar commented 2 years ago

Is there any solution to this issue? In @d2l-ai, we have the same problem.

I'm getting this warning in CI complaining about CUDA initialization error and ignoring it. I've no idea what's causing this. Here's a CI log; check the error under "Execute Notebooks MXNet" job.

src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error
/home/centos/mxnet/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:54: Check failed: e == cudaSuccess: CUDA: initialization error

Thanks in advance :))