deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
22.9k stars 5.36k forks source link

RuntimeError: simple_bind error. Arguments: #996

Open amrintdv opened 4 years ago

amrintdv commented 4 years ago

I have this question, can you help me? thank you very much!

call reset() Traceback (most recent call last): File "train.py", line 378, in main() File "train.py", line 375, in main train_net(args) File "train.py", line 370, in train_net epoch_end_callback = epoch_cb ) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/base_module.py", line 498, in fit for_training=True, force_rebind=force_rebind) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/module.py", line 429, in bind state_names=self._state_names) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 279, in init self.bind_exec(data_shapes, label_shapes, shared_group) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 375, in bind_exec shared_group)) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/module/executor_group.py", line 662, in _bind_ith_exec shared_buffer=shared_data_arrays, *input_shapes) File "/home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/symbol/symbol.py", line 1629, in simple_bind raise RuntimeError(error_msg) RuntimeError: simple_bind error. Arguments: data: (4, 3, 112, 112) softmax_label: (4,) [13:19:41] src/engine/./../common/cuda_utils.h:310: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal Stack trace: [bt] (0) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x4b03ab) [0x7f031c9d03ab] [bt] (1) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x25cc459) [0x7f031eaec459] [bt] (2) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2e64d96) [0x7f031f384d96] [bt] (3) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x2e6a96f) [0x7f031f38a96f] [bt] (4) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::NDArray(mxnet::TShape const&, mxnet::Context, bool, int)+0x5d0) [0x7f031ea515b0] [bt] (5) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::common::InitZeros(mxnet::NDArrayStorageType, mxnet::TShape const&, mxnet::Context const&, int)+0x5c) [0x7f031eb052ac] [bt] (6) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::common::ReshapeOrCreate(std::string const&, mxnet::TShape const&, int, mxnet::NDArrayStorageType, mxnet::Context const&, std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > >, bool)+0x3a1) [0x7f031eb189f1] [bt] (7) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::InitArguments(nnvm::IndexedGraph const&, std::vector<mxnet::TShape, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<int, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, mxnet::Executor const, std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > >, std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator >)+0xb10) [0x7f031eb209c0] [bt] (8) /home/ai/anaconda3/envs/arcface/lib/python2.7/site-packages/mxnet/libmxnet.so(mxnet::exec::GraphExecutor::Init(nnvm::Symbol, mxnet::Context const&, std::map<std::string, mxnet::Context, std::less, std::allocator<std::pair<std::string const, mxnet::Context> > > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::vector<mxnet::Context, std::allocator > const&, std::unordered_map<std::string, mxnet::TShape, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::TShape> > > const&, std::unordered_map<std::string, int, std::hash, std::equal_to, std::allocator<std::pair<std::string const, int> > > const&, std::unordered_map<std::string, int, std::hash, std::equal_to, std::allocator<std::pair<std::string const, int> > > const&, std::vector<mxnet::OpReqType, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator >, std::vector<mxnet::NDArray, std::allocator >, std::unordered_map<std::string, mxnet::NDArray, std::hash, std::equal_to, std::allocator<std::pair<std::string const, mxnet::NDArray> > >, mxnet::Executor, std::unordered_map<nnvm::NodeEntry, mxnet::NDArray, nnvm::NodeEntryHash, nnvm::NodeEntryEqual, std::allocator<std::pair<nnvm::NodeEntry const, mxnet::NDArray> > > const&)+0x6bc) [0x7f031eb2ed9c]

MUZLATAN commented 4 years ago

I met this problem beofre and I solved the problem by reinstall my nvidia drever. This problem may be from the cuda is not match your nvidia deriver. If you want to figure it, you can test it as bellow:

sudo pip install mxnet-cu80   (if your cuda on your machine is cuda 8.0)
# test.py
import mxnet as mx
mx.nd.array([0], ctx = mx.gpu())

then run your code, if you get the wrong info like this:

mxnet.base.MXNetError: [14:40:28] src/storage/storage.cc:119: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading
 CUDA: CUDA driver version is insufficient for CUDA runtime version

that means you need to reinstall you nvidia driver. The Detailed information on this site: nvidia Table 1. CUDA Toolkit and Compatible Driver Versions

CUDA Toolkit Linux x86_64 Driver Version Windows x86_64 Driver Version
CUDA 10.2.89 >= 440.33 >= 441.22
CUDA 10.1 (10.1.105 general) >= 418.39 >= 418.96
CUDA 10.0.130 >= 410.48 >= 411.31
CUDA 9.2 (9.2.148 Update 1) >= 396.37 >= 398.26
CUDA 9.2 (9.2.88) >= 396.26 >= 397.44
CUDA 9.1 (9.1.85) >= 390.46 >= 391.29
CUDA 9.0 (9.0.76) >= 384.81 >= 385.54
CUDA 8.0 (8.0.61 GA2) >= 375.26 >= 376.51
CUDA 8.0 (8.0.44) >= 367.48 >= 369.30
CUDA 7.5 (7.5.16) >= 352.31 >= 353.66
CUDA 7.0 (7.0.28) >= 346.46 >= 347.62

Hope this can help you

goodgoodstudy92 commented 4 years ago

I also met this problem before, I reinstall the nvidia driver through ppa rather than ''sudo ./NVIDIA ***.run --no-opengl-files'', than everything works fine.

xhwNobody commented 3 years ago

@amrintdv Hello, I meet the same problem as you, could you tell me how did you solve it? My error is displayed as : src/engine/./../common/cuda_utils.h:310: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal not: mxnet.base.MXNetError: [14:40:28] src/storage/storage.cc:119: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading CUDA: CUDA driver version is insufficient for CUDA runtime version.