Open amrintdv opened 4 years ago
I met this problem beofre and I solved the problem by reinstall my nvidia drever. This problem may be from the cuda is not match your nvidia deriver. If you want to figure it, you can test it as bellow:
sudo pip install mxnet-cu80 (if your cuda on your machine is cuda 8.0)
# test.py
import mxnet as mx
mx.nd.array([0], ctx = mx.gpu())
then run your code, if you get the wrong info like this:
mxnet.base.MXNetError: [14:40:28] src/storage/storage.cc:119: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading
CUDA: CUDA driver version is insufficient for CUDA runtime version
that means you need to reinstall you nvidia driver. The Detailed information on this site: nvidia Table 1. CUDA Toolkit and Compatible Driver Versions
CUDA Toolkit | Linux x86_64 Driver Version | Windows x86_64 Driver Version |
---|---|---|
CUDA 10.2.89 | >= 440.33 | >= 441.22 |
CUDA 10.1 (10.1.105 general) | >= 418.39 | >= 418.96 |
CUDA 10.0.130 | >= 410.48 | >= 411.31 |
CUDA 9.2 (9.2.148 Update 1) | >= 396.37 | >= 398.26 |
CUDA 9.2 (9.2.88) | >= 396.26 | >= 397.44 |
CUDA 9.1 (9.1.85) | >= 390.46 | >= 391.29 |
CUDA 9.0 (9.0.76) | >= 384.81 | >= 385.54 |
CUDA 8.0 (8.0.61 GA2) | >= 375.26 | >= 376.51 |
CUDA 8.0 (8.0.44) | >= 367.48 | >= 369.30 |
CUDA 7.5 (7.5.16) | >= 352.31 | >= 353.66 |
CUDA 7.0 (7.0.28) | >= 346.46 | >= 347.62 |
Hope this can help you
I also met this problem before, I reinstall the nvidia driver through ppa rather than ''sudo ./NVIDIA ***.run --no-opengl-files'', than everything works fine.
@amrintdv Hello, I meet the same problem as you, could you tell me how did you solve it? My error is displayed as : src/engine/./../common/cuda_utils.h:310: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal not: mxnet.base.MXNetError: [14:40:28] src/storage/storage.cc:119: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: CUDA driver version is insufficient for CUDA runtime version.
I have this question, can you help me? thank you very much!
call reset() Traceback (most recent call last): File "train.py", line 378, in
main()
File "train.py", line 375, in main
train_net(args)
File "train.py", line 370, in train_net
epoch_end_callback = epoch_cb )
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/base_module.py", line 498, in fit
for_training=True, force_rebind=force_rebind)
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/module.py", line 429, in bind
state_names=self._state_names)
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 279, in init
self.bind_exec(data_shapes, label_shapes, shared_group)
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 375, in bind_exec
shared_group))
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 662, in _bind_ith_exec
shared_buffer=shared_data_arrays, *input_shapes)
File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1629, in simple_bind
raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (4, 3, 112, 112)
softmax_label: (4,)
[13:19:41] src/engine/./../common/cuda_utils.h:310: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal
Stack trace:
[bt] (0) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4b03ab) [0x7f031c9d03ab]
[bt] (1) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x25cc459) [0x7f031eaec459]
[bt] (2) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2e64d96) [0x7f031f384d96]
[bt] (3) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2e6a96f) [0x7f031f38a96f]
[bt] (4) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::NDArray(mxnet::TShape const&, mxnet::Context, bool, int)+0x5d0) [0x7f031ea515b0]
[bt] (5) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::common::InitZeros(mxnet::NDArrayStorageType, mxnet::TShape const&, mxnet::Context const&, int)+0x5c) [0x7f031eb052ac]
[bt] (6) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::common::ReshapeOrCreate(std::string const&, mxnet::TShape const&, int, mxnet::NDArrayStorageType, mxnet::Context const&, std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > > , bool)+0x3a1) [0x7f031eb189f1]
[bt] (7) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::InitArguments(nnvm::IndexedGraph const&, std::vector<mxnet::TShape, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, mxnet::Executor const, std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > > , std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator > , std::vector<mxnet::NDArray, std::allocator >)+0xb10) [0x7f031eb209c0]
[bt] (8) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::string, mxnet::Context, std::less, std::allocator<std::pair<std::string const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::unordered_map<std::string, mxnet::TShape, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::TShape> > > const&, std::unordered_map<std::string, int, std::hash, std::equal_to, std::allocator<std::pair<std::string const, int> > > const&, std::unordered_map<std::string, int, std::hash, std::equal_to, std::allocator<std::pair<std::string const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<mxnet::NDArray, std::allocator > , std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator > , std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > >, mxnet::Executor, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x6bc) [0x7f031eb2ed9c]