NVIDIA / mxnet_to_onnx

MxNet to ONNX Exporter
Apache License 2.0
56 stars 12 forks source link

Compile with USE_CUDA=1 to enable GPU usage #3

Closed gr8Adakron closed 6 years ago

gr8Adakron commented 6 years ago

While running testing command I got this error:

Command

python setup.py test

Error:

Training LeNet-5 on MNIST data

Using gpu(1) to train
ERROR

======================================================================
ERROR: test_convert_and_compare_prediction (test_convert_lenet5.LeNet5Test)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/afzal/onnx_conversion/mxnet_to_onnx/tests/test_convert_lenet5.py", line 136, in test_convert_and_compare_prediction
    trained_lenet = train_lenet5(num_epochs, gpu_id, train_iter, val_iter, test_iter, batch_size)
  File "/home/afzal/onnx_conversion/mxnet_to_onnx/tests/test_convert_lenet5.py", line 97, in train_lenet5
    num_epoch=num_epochs)
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/module/base_module.py", line 484, in fit
    for_training=True, force_rebind=force_rebind)
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/module/module.py", line 430, in bind
    state_names=self._state_names)
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/module/executor_group.py", line 265, in __init__
    self.bind_exec(data_shapes, label_shapes, shared_group)
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/module/executor_group.py", line 361, in bind_exec
    shared_group))
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/module/executor_group.py", line 639, in _bind_ith_exec
    shared_buffer=shared_data_arrays, **input_shapes)
  File "/home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/symbol/symbol.py", line 1519, in simple_bind
    raise RuntimeError(error_msg)
RuntimeError: simple_bind error. Arguments:
data: (1000, 1, 28, 28)
softmax_label: (1000,)
[13:00:41] src/storage/storage.cc:123: Compile with USE_CUDA=1 to enable GPU usage

Stack trace returned 10 entries:
[bt] (0) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x1c05f2) [0x7fab4e0795f2]
[bt] (1) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x1c0bd8) [0x7fab4e079bd8]
[bt] (2) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x2d7d3cd) [0x7fab50c363cd]
[bt] (3) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x2d8141d) [0x7fab50c3a41d]
[bt] (4) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x2d83206) [0x7fab50c3c206]
[bt] (5) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x27a2831) [0x7fab5065b831]
[bt] (6) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x27a2984) [0x7fab5065b984]
[bt] (7) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x27aecec) [0x7fab50667cec]
[bt] (8) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x27b55f8) [0x7fab5066e5f8]
[bt] (9) /home/afzal/.virtualenvs/tensorflow_py36/lib/python3.6/site-packages/mxnet-1.2.0-py3.6-linux-x86_64.egg/mxnet/libmxnet.so(+0x27c163a) [0x7fab5067a63a]

-------------------- >> begin captured logging << --------------------
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): data.mxnet.io
urllib3.connectionpool: DEBUG: http://data.mxnet.io:80 "GET /data/mnist/train-labels-idx1-ubyte.gz HTTP/1.1" 200 28881
root: INFO: downloaded http://data.mxnet.io/data/mnist/train-labels-idx1-ubyte.gz into train-labels-idx1-ubyte.gz successfully
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): data.mxnet.io
urllib3.connectionpool: DEBUG: http://data.mxnet.io:80 "GET /data/mnist/train-images-idx3-ubyte.gz HTTP/1.1" 200 9912422
root: INFO: downloaded http://data.mxnet.io/data/mnist/train-images-idx3-ubyte.gz into train-images-idx3-ubyte.gz successfully
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): data.mxnet.io
urllib3.connectionpool: DEBUG: http://data.mxnet.io:80 "GET /data/mnist/t10k-labels-idx1-ubyte.gz HTTP/1.1" 200 4542
root: INFO: downloaded http://data.mxnet.io/data/mnist/t10k-labels-idx1-ubyte.gz into t10k-labels-idx1-ubyte.gz successfully
urllib3.connectionpool: DEBUG: Starting new HTTP connection (1): data.mxnet.io
urllib3.connectionpool: DEBUG: http://data.mxnet.io:80 "GET /data/mnist/t10k-images-idx3-ubyte.gz HTTP/1.1" 200 1648877
root: INFO: downloaded http://data.mxnet.io/data/mnist/t10k-images-idx3-ubyte.gz into t10k-images-idx3-ubyte.gz successfully
root: INFO: train-labels-idx1-ubyte.gz exists, skipping download
root: INFO: train-images-idx3-ubyte.gz exists, skipping download
root: INFO: t10k-labels-idx1-ubyte.gz exists, skipping download
root: INFO: t10k-images-idx3-ubyte.gz exists, skipping download
--------------------- >> end captured logging << ---------------------

----------------------------------------------------------------------
Ran 1 test in 7.491s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

I have install cuda-80 to make the use of GPU. Any Help! Urgent!

mkolod commented 6 years ago

@gr8Adakron it should work now.

whu-dft commented 5 years ago

I solve this problem by installing the cuda version of mxnet: pip install mxnet-cu90.

hyderit commented 4 years ago

I found this more helpful in fixing the problem: https://github.com/dmlc/gluon-cv/issues/698

flavienbwk commented 4 years ago

Use mxnet-cu102 for CUDA 10.2 with Python 3.7