Jetson: Segmentation Fault 11 When Importing MXNET

nabulsi commented 4 years ago

Description

>> import mxnet

Segmentation fault: 11

Aborted (core dumped)

Steps to reproduce

I have followed the steps to build mxnet from source as described here ( https://mxnet.apache.org/get_started/jetson_setup ). I used Docker running on an AWS EC2 instance and Deep Learning AMI (i.e. Docker, MXNET, Cuda, etc.. are all built in). I then downloaded the generated libmxnet.so file to the Jetson Nano.

Next, on the Jetson Nano:

git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
added the following to ~/.bashrc: export PATH=/usr/local/cuda/bin:$PATH export MXNET_HOME=$HOME/mxnet/ export PYTHONPATH=$MXNET_HOME/python:$PYTHONPATH

source ~/.bashrc

cd $MXNET_HOME/python sudo pip3 install -e .

Then every time I import mxnet I get the segmentation error

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

----------Python Info----------
Version      : 3.7.7
Compiler     : GCC 7.5.0
Build        : ('default', 'Jun 25 2020 13:11:10')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 20.1.1
Directory    : /home/username/python3.7/lib/python3.7/site-packages/pip
----------MXNet Info-----------

Segmentation fault: 11

Aborted (core dumped)

Environment

OS: Ubuntu 18

Some details regarding the Jetson Nano:

Device 0: "NVIDIA Tegra X1"
  CUDA Driver Version / Runtime Version          10.2 / 10.2
  CUDA Capability Major/Minor version number:    5.3
  Total amount of global memory:                 3956 MBytes (4148449280 bytes)
  ( 1) Multiprocessors, (128) CUDA Cores/MP:     128 CUDA Cores
  GPU Max Clock rate:                            922 MHz (0.92 GHz)
  Memory Clock rate:                             13 Mhz
  Memory Bus Width:                              64-bit
  L2 Cache Size:                                 262144 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            Yes
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 0 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

leezu commented 4 years ago

cc @mseth10

mseth10 commented 4 years ago

Hi @nabulsi , thanks for raising this issue. I have tested the instructions (here) on building MXNet on Jetson module myself and they work fine. There is also a tutorial that you can follow: https://mxnet.apache.org/api/python/docs/tutorials/deploy/inference/image_classification_jetson.html This tutorial also contains a link to (3rd party) MXNet wheel that you can directly use. This wheel has been built using the steps on the installation page.

nabulsi commented 4 years ago

Hi @mseth10 . Thanks for working on the issue. I just tried the method with the wheel that you mentioned. I also got the same error message:

root@user-jetson2:/home/user# python
Python 3.7.7 (default, Jun 25 2020, 13:11:10) 
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet

Segmentation fault: 11

Segmentation fault (core dumped)

Did you try using python 3.7?

Another thing: for me, building MXNet from source on the Jetson Nano mostly doesn't finish. After 2.5 hours it just interrupts, although I have swap file of 6 GB :(

mseth10 commented 4 years ago

Jetson Nano might need a higher swap memory (>20GB) and a very long time. I built it on Jetson Xavier AGX and it still took a few hours.

Do you mean the following code (from the tutorial) segfaults? Which Jetpack version is installed on your device? It should work for Jetpack 4.4

sudo apt-get update
sudo apt-get install -y git build-essential libopenblas-dev libopencv-dev python3-pip
sudo pip3 install -U pip

wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl

>>> python3 -c 'import mxnet'

nabulsi commented 4 years ago

Yes, these are the steps that I followed. As said, I also tried before to cross compile and then use the generated library on the Jetson, but was getting the same issue. I am currently flashing my SD card with the very original content and will try it on it to see if any changes I made are causing the problem.

mseth10 commented 4 years ago

Please let me know how it goes.

szha commented 4 years ago

@mseth10 @leezu would we be able to offer a wheel through cross compilation?

mseth10 commented 4 years ago

@szha yeah it should be possible now that NVIDIA has shared their cross compilation toolchain on apt server.

leezu commented 4 years ago

Following the cross-build instructions locally was blocked for a few months due to non-public toolchain files. NVidia has now provided some files, but for example cuDNN is missing. @TristonC is tracking that internally and may update the build at https://github.com/apache/incubator-mxnet/pull/18450.

@mseth10 had you verified the cross-compile libmxnet on device already?

nabulsi commented 4 years ago

This is my update: Problem is not happening any more.

I used a fresh image provided by NVidia for my Nano and then went through all the steps again (installed python3.7, installed dependencies, etc..). Then I used the wheel mentioned above to install MXNet 1.6:

wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl

After that I compiled OpenCV 4.4.0 from source. The segmentation errors are not happening any more.

That said, I need your guidance please with one thing please: I tried to cross compile MXNET 2.0.0 as described here, i.e. by using Docker on an EC2 instance (P3) with Deep Learning AMI : $MXNET_HOME/ci/build.py -p jetson

While I was able to import the generated library on the Nano, I received errors when trying to work with it:

import mxnet as mx
a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
b.asnumpy()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jetson/mxnet/python/mxnet/ndarray/ndarray.py", line 2570, in asnumpy
    ctypes.c_size_t(data.size)))
  File "/home/jetson/mxnet/python/mxnet/base.py", line 246, in check_call
    raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
  File "/work/mxnet/src/operator/numpy/../tensor/../../common/../operator/mxnet_op.h", line 1132
Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device

Upon running cuobjdump /path/to/libmxnet.so I noticed that the architecture is showing as arch = sm_52, whereas as we know the Nano has sm_53

How can I cross compile for the Nano on an EC2 instance?

Thanks!

mseth10 commented 4 years ago

@nabulsi that's great news. I have not yet tested the cross compilation script provided on the installation page, and it might need some fixing. Until that is done, is there anything that you are blocked on currently, anything that you are unable to do with the wheel provided?

nabulsi commented 4 years ago

@mseth10 the wheel is currently enough for me. I can move forward now, but I am worried if in the next few days/weeks I find that I need something more and I will have to cross compile. It will be great when you have some time to check it. Also, I noticed that a few days ago they removed support for Make file, and consequently, the building instructions on this doc page are not valid any more.
Thanks!

apache / mxnet