installed mxnet-cu92 on ubuntu but can't run example code correctly #11535

Closed zhuotest closed 6 years ago

zhuotest commented 6 years ago

I'm on ubuntu16.04 with GPU and pip, and follows instruction in official documentation web page (https://mxnet.incubator.apache.org/install/index.html?platform=Linux&language=Python&processor=GPU). I follow that instructions, which can install mxnet-cu92 but cannot run example code correctly.

Environment info (Required)

➜  test python diag.py 
----------Python Info----------
('Version      :', '2.7.12')
('Compiler     :', 'GCC 5.4.0 20160609')
('Build        :', ('default', 'Dec  4 2017 14:50:18'))
('Arch         :', ('64bit', 'ELF'))
------------Pip Info-----------
('Version      :', '10.0.1')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/pip')
----------MXNet Info-----------
('Version      :', '1.2.0')
('Directory    :', '/usr/local/lib/python2.7/dist-packages/mxnet')
('Commit Hash   :', '297c64fd2ee404612aa3ecc880b940fb2538039c')
----------System Info----------
('Platform     :', 'Linux-4.4.0-127-generic-x86_64-with-Ubuntu-16.04-xenial')
('system       :', 'Linux')
('node         :', '1080')
('release      :', '4.4.0-127-generic')
('version      :', '#153-Ubuntu SMP Sat May 19 10:58:46 UTC 2018')
----------Hardware Info----------
('machine      :', 'x86_64')
('processor    :', 'x86_64')
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Model name:            Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
Stepping:              3
CPU MHz:               3799.828
CPU max MHz:           4000.0000
CPU min MHz:           800.0000
BogoMIPS:              7183.49
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt ibpb ibrs stibp dtherm ida arat pln pts
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0637 sec, LOAD: 2.6516 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.5115 sec, LOAD: 9.2181 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2099 sec, LOAD: 1.9156 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.5327 sec, LOAD: 2.1375 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.8268 sec, LOAD: 3.7883 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.5108 sec, LOAD: 6.5522 sec.

Package used Python2.7.12

Build info

I first download and intalled cuda-9.2 and cudnn7.1 from nvidia website. Then I sudo pip install mxnet-cu92

Error Message:

see next section for full info

Minimum reproducible example

I use the official website's example code in a interpret envirionment. The code and output is:

chris@1080:~$ python
Python 2.7.12 (default, Dec  4 2017, 14:50:18) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> a = mx.nd.ones((2, 3), mx.gpu())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/mxnet/ndarray/ndarray.py", line 2271, in ones
    return _internal._ones(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
  File "<string>", line 34, in _ones
  File "/usr/local/lib/python2.7/dist-packages/mxnet/_ctypes/ndarray.py", line 92, in _imperative_invoke
  File "/usr/local/lib/python2.7/dist-packages/mxnet/base.py", line 149, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [08:07:26] src/engine/threaded_engine.cc:318: Check failed: device_count_ > 0 (-1 vs. 0) GPU usage requires at least 1 GPU

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x308362) [0x7fc4411d8362]
[bt] (1) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x308938) [0x7fc4411d8938]
[bt] (2) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x29433a3) [0x7fc4438133a3]
[bt] (3) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x294445f) [0x7fc44381445f]
[bt] (4) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x29bf09f) [0x7fc44388f09f]
[bt] (5) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x29c3693) [0x7fc443893693]
[bt] (6) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x29c45e3) [0x7fc4438945e3]
[bt] (7) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(+0x28fc3cb) [0x7fc4437cc3cb]
[bt] (8) /usr/local/lib/python2.7/dist-packages/mxnet/libmxnet.so(MXImperativeInvokeEx+0x6f) [0x7fc4437cc98f]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7fc47d496e40]

Steps to reproduce

install ubuntu16.04 install cuda9.2 install cudnn7.1 use python 2.7.12 Open terminal and type:

import mxnet as mx
a = mx.nd.ones((2, 3), mx.gpu())

What have you tried to solve it?

Note that I can build official Caffe with my installed cuda9.2 and cudnn7.1 (https://github.com/BVLC/caffe). It only gives some cudnn warnings but can complete compilation.

andrewfayres commented 6 years ago

Thank you for submitting the issue! @sandeep-krishnamurthy requesting this be labeled as installation.

zhuotest commented 6 years ago

Finally I figured this out. Its my incorrect configuration. I should have installed newer GPU Card driver since I installed newer version of CUDA (9.2). In fact I installed nvidia-396 on my 1080Ti.

rohun-tripathi commented 5 years ago

Hey, how did you get the nvidia-396 driver?

ChaiBapchya commented 4 years ago

@zhuotest you can confirm But @rohun-tripathi does this help - https://www.nvidia.com/drivers/beta ?