Open nabulsi opened 4 years ago
cc @mseth10
Hi @nabulsi , thanks for raising this issue. I have tested the instructions (here) on building MXNet on Jetson module myself and they work fine. There is also a tutorial that you can follow: https://mxnet.apache.org/api/python/docs/tutorials/deploy/inference/image_classification_jetson.html This tutorial also contains a link to (3rd party) MXNet wheel that you can directly use. This wheel has been built using the steps on the installation page.
Hi @mseth10 . Thanks for working on the issue. I just tried the method with the wheel that you mentioned. I also got the same error message:
root@user-jetson2:/home/user# python
Python 3.7.7 (default, Jun 25 2020, 13:11:10)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet
Segmentation fault: 11
Segmentation fault (core dumped)
Did you try using python 3.7?
Another thing: for me, building MXNet from source on the Jetson Nano mostly doesn't finish. After 2.5 hours it just interrupts, although I have swap file of 6 GB :(
Jetson Nano might need a higher swap memory (>20GB) and a very long time. I built it on Jetson Xavier AGX and it still took a few hours.
Do you mean the following code (from the tutorial) segfaults? Which Jetpack version is installed on your device? It should work for Jetpack 4.4
sudo apt-get update
sudo apt-get install -y git build-essential libopenblas-dev libopencv-dev python3-pip
sudo pip3 install -U pip
wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
>>> python3 -c 'import mxnet'
Yes, these are the steps that I followed. As said, I also tried before to cross compile and then use the generated library on the Jetson, but was getting the same issue. I am currently flashing my SD card with the very original content and will try it on it to see if any changes I made are causing the problem.
Please let me know how it goes.
@mseth10 @leezu would we be able to offer a wheel through cross compilation?
@szha yeah it should be possible now that NVIDIA has shared their cross compilation toolchain on apt server.
Following the cross-build instructions locally was blocked for a few months due to non-public toolchain files. NVidia has now provided some files, but for example cuDNN is missing. @TristonC is tracking that internally and may update the build at https://github.com/apache/incubator-mxnet/pull/18450.
@mseth10 had you verified the cross-compile libmxnet on device already?
This is my update: Problem is not happening any more.
I used a fresh image provided by NVidia for my Nano and then went through all the steps again (installed python3.7, installed dependencies, etc..). Then I used the wheel mentioned above to install MXNet 1.6:
wget https://mxnet-public.s3.us-east-2.amazonaws.com/install/jetson/1.6.0/mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
sudo pip3 install mxnet_cu102-1.6.0-py2.py3-none-linux_aarch64.whl
After that I compiled OpenCV 4.4.0 from source. The segmentation errors are not happening any more.
That said, I need your guidance please with one thing please:
I tried to cross compile MXNET 2.0.0 as described here, i.e. by using Docker on an EC2 instance (P3) with Deep Learning AMI :
$MXNET_HOME/ci/build.py -p jetson
While I was able to import the generated library on the Nano, I received errors when trying to work with it:
import mxnet as mx
a = mx.nd.ones((2, 3), mx.gpu())
b = a * 2 + 1
b.asnumpy()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jetson/mxnet/python/mxnet/ndarray/ndarray.py", line 2570, in asnumpy
ctypes.c_size_t(data.size)))
File "/home/jetson/mxnet/python/mxnet/base.py", line 246, in check_call
raise get_last_ffi_error()
mxnet.base.MXNetError: Traceback (most recent call last):
File "/work/mxnet/src/operator/numpy/../tensor/../../common/../operator/mxnet_op.h", line 1132
Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device
Upon running cuobjdump /path/to/libmxnet.so
I noticed that the architecture is showing as arch = sm_52
, whereas as we know the Nano has sm_53
How can I cross compile for the Nano on an EC2 instance?
Thanks!
@nabulsi that's great news. I have not yet tested the cross compilation script provided on the installation page, and it might need some fixing. Until that is done, is there anything that you are blocked on currently, anything that you are unable to do with the wheel provided?
@mseth10 the wheel is currently enough for me. I can move forward now, but I am worried if in the next few days/weeks I find that I need something more and I will have to cross compile. It will be great when you have some time to check it. Also, I noticed that a few days ago they removed support for Make file, and consequently, the building instructions on this doc page are not valid any more.
Thanks!
Description
>> import mxnet
Segmentation fault: 11
Aborted (core dumped)
Steps to reproduce
Next, on the Jetson Nano:
git clone --recursive https://github.com/apache/incubator-mxnet.git mxnet
added the following to ~/.bashrc: export PATH=/usr/local/cuda/bin:$PATH export MXNET_HOME=$HOME/mxnet/ export PYTHONPATH=$MXNET_HOME/python:$PYTHONPATH
source ~/.bashrc
Then every time I import mxnet I get the segmentation error
Environment
OS: Ubuntu 18
Some details regarding the Jetson Nano: