Closed sy1019 closed 3 years ago
I meet the same problem!
idem.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Is there any solution to this issue? In @d2l-ai, we have the same problem.
I'm getting this warning in CI complaining about CUDA initialization error and ignoring it. I've no idea what's causing this. Here's a CI log; check the error under "Execute Notebooks MXNet" job.
src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error
/home/centos/mxnet/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:54: Check failed: e == cudaSuccess: CUDA: initialization error
Thanks in advance :))
[19:58:06] src/engine/threaded_engine_perdevice.cc:101: Ignore CUDA Error [19:58:06] /home/ubuntu/mxnet-distro/mxnet-build/3rdparty/mshadow/mshadow/./tensor_gpu-inl.h:35: Check failed: e == cudaSuccess: CUDA: initialization error Stack trace: [bt] (0) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6ccefb) [0x7fbdf3210efb] [bt] (1) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38ef552) [0x7fbdf6433552] [bt] (2) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3912c7e) [0x7fbdf6456c7e] [bt] (3) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x3905641) [0x7fbdf6449641] [bt] (4) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38fb1d1) [0x7fbdf643f1d1] [bt] (5) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x38fc124) [0x7fbdf6440124] [bt] (6) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Chunk::~Chunk()+0x3c2) [0x7fbdf666c9a2] [bt] (7) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x6d06aa) [0x7fbdf32146aa] [bt] (8) /data4/dyk/anaconda3/envs/dykDB/lib/python3.7/site-packages/mxnet/libmxnet.so(MXNDArrayFree+0x54) [0x7fbdf63a97d4]
When the above error is displayed during training, but the training is not interrupted, how can I avoid this error (warning)?