apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

RuntimeError: Cannot find the MXNet library. #20144

Closed Raunak-Singh-Inventor closed 3 years ago

Raunak-Singh-Inventor commented 3 years ago

Description

I am trying to build MxNet from source on my 32-bit Raspberry Pi 4. It is giving me the error mentioned below.

I think the bug is caused because it can't find libmxnet.so in mxnet/python/mxnet/. Here is the output of ls mxnet/python/mxnet/ from my home directory where I cloned the repo.

screenshot1

Hope someone has a solution :)

Error Message

Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Obtaining file:///home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/setup.py'"'"'; __file__='"'"'/home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-9ni2u7t1
         cwd: /home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/
    Complete output (12 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/setup.py", line 47, in <module>
        LIB_PATH = libinfo['find_lib_path']()
      File "/home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/mxnet/libinfo.py", line 74, in find_lib_path
        'List of candidates:\n' + str('\n'.join(dll_path)))
    RuntimeError: Cannot find the MXNet library.
    List of candidates:
    /home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/mxnet/libmxnet.so
    /home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/mxnet/../../lib/libmxnet.so
    /home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python/mxnet/../../build/libmxnet.so
    ../../../libmxnet.so
    ----------------------------------------
WARNING: Discarding file:///home/pi/Downloads/apache-mxnet-src-1.4.0-incubating/python. Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

To Reproduce

https://mxnet.apache.org/get_started/build_from_source#install-mxnet-for-python

Steps to reproduce

  1. git clone --recursive https://github.com/apache/incubator-mxnet mxnet
  2. cd mxnet
  3. sudo apt-get update
  4. sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake
  5. sudo apt install gfortran
  6. python3 -m pip install --user -e ./python

What have you tried to solve it?

  1. Looking for libmxnet.so in the directory. It was not there.
  2. I tried downloading libmxnet.so from https://github.com/awslabs/mxnet-lambda/blob/master/src/mxnet/libmxnet.so. Then I ran some commands in the terminal:
    • mv Downloads/libmxnet.so mxnet/python/mxnet/
    • cd mxnet/python/mxnet/
    • ls
    • cd && cd mxnet
    • python3 -m pip install --user -e ./python

After running, it gives me the output that Successfully installed mxnet, but when I try to import mxnet in python3, it says: libf77blas.so.3: cannot open shared object file: No such file or directory

I have no clue what to do? 😕

Environment

We recommend using our script for collecting the diagnostic information with the following command curl --retry 10 -s https://raw.githubusercontent.com/apache/incubator-mxnet/master/tools/diagnose.py | python3

Environment Information ``` ----------Python Info---------- Version : 3.7.3 Compiler : GCC 8.3.0 Build : ('default', 'Jul 25 2020 13:03:44') Arch : ('32bit', 'ELF') ------------Pip Info----------- Version : 21.0.1 Directory : /home/pi/.local/lib/python3.7/site-packages/pip ----------MXNet Info----------- An error occured trying to import mxnet. This is very likely due to missing missing or incompatible library files. Traceback (most recent call last): File "", line 96, in check_mxnet AttributeError: module 'mxnet' has no attribute '__version__' ----------System Info---------- Platform : Linux-5.10.17-v7l+-armv7l-with-debian-10.8 system : Linux node : raspberrypi release : 5.10.17-v7l+ version : #1403 SMP Mon Feb 22 11:33:35 GMT 2021 ----------Hardware Info---------- machine : armv7l processor : Architecture: armv7l Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Vendor ID: ARM Model: 3 Model name: Cortex-A72 Stepping: r0p3 CPU max MHz: 1500.0000 CPU min MHz: 600.0000 BogoMIPS: 270.00 Flags: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0104 sec, LOAD: 0.0684 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0031 sec, LOAD: 0.0587 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, , DNS finished in 0.003855466842651367 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0018 sec, LOAD: 0.1406 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0050 sec, LOAD: 0.2012 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.006901264190673828 sec. ----------Environment---------- ```
github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

Raunak-Singh-Inventor commented 3 years ago

Update:

After a little more exploration, I found out that I forgot the compiling step, and that is why I can't find libmxnet.so. Now I have a new bug to report.

-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR armv7l
-- CMAKE_SYSTEM_PROCESSOR armv7l
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.18.4' using generator 'Unix Makefiles'
CMake Error at /home/pi/.local/lib/python3.7/site-packages/cmake/data/share/cmake-3.18/Modules/CMakeDetermineCUDACompiler.cmake:166 (message):
  Could not find nvcc, please set CUDAToolkit_ROOT.
Call Stack (most recent call first):
  CMakeLists.txt:123 (enable_language)

-- Configuring incomplete, errors occurred!
See also "/home/pi/mxnet/build/CMakeFiles/CMakeOutput.log".
See also "/home/pi/mxnet/build/CMakeFiles/CMakeError.log".

Here are the steps to reproduce:

  1. git clone --recursive https://github.com/apache/incubator-mxnet mxnet
  2. cd mxnet
  3. sudo apt-get update
  4. sudo apt-get install -y build-essential git ninja-build ccache libopenblas-dev libopencv-dev cmake
  5. sudo apt install gfortran
  6. mkdir build
  7. cd build/
  8. cmake ..
  9. python3 -m pip install --user -e ./python

@szha and @leezu Can you help me?

szha commented 3 years ago

@Raunak-Singh-Inventor the error says nvcc not found, which means cmake is trying to look for CUDA support on your raspberry pi. I think the fix would be to simply turn the USE_CUDA off. In fact, you should be able to rely on the linux CPU build config for building on rpi. You can copy it to the root of mxnet and name it config.cmake to make it active.

Raunak-Singh-Inventor commented 3 years ago

Thanks for the reply @szha.

After adding config.cmake to the root of mxnet, I am getting a new error in the compilation:

-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_CROSSCOMPILING FALSE
-- CMAKE_HOST_SYSTEM_PROCESSOR armv7l
-- CMAKE_SYSTEM_PROCESSOR armv7l
-- CMAKE_SYSTEM_NAME Linux
-- CMake version '3.18.4' using generator 'Unix Makefiles'
-- CMAKE_BUILD_TYPE is unset, defaulting to Release
CMake Error at 3rdparty/onednn/CMakeLists.txt:87 (message):
  oneDNN supports 64 bit platforms only

-- Configuring incomplete, errors occurred!
See also "/home/pi/mxnet/build/CMakeFiles/CMakeOutput.log".
Raunak-Singh-Inventor commented 3 years ago

I am also including the CMakeOutput.log: CMakeOutput.log

Raunak-Singh-Inventor commented 3 years ago

@szha I still have an error. Can you help me?

leezu commented 3 years ago

You can update the config file to disable onednn

Raunak-Singh-Inventor commented 3 years ago

@leezu Disabling onednn fixed it. I and getting a new error now when running cmake --build .

[ 61%] Building CXX object CMakeFiles/mxnet.dir/src/operator/numpy/np_ediff1d_op.cc.o
[ 61%] Building CXX object CMakeFiles/mxnet.dir/src/operator/numpy/np_einsum_op.cc.o
In file included from /usr/include/c++/8/vector:69,
                 from /home/pi/mxnet/include/dmlc/registry.h:11,
                 from /home/pi/mxnet/include/mxnet/operator_util.h:37,
                 from /home/pi/mxnet/src/operator/numpy/./np_einsum_op-inl.h:65,
                 from /home/pi/mxnet/src/operator/numpy/np_einsum_op.cc:58:
/usr/include/c++/8/bits/vector.tcc: In member function ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::_M_erase(std::vector<_Tp, _Alloc>::iterator) [with _Tp = mxnet::op::Alternative; _Alloc = std::allocator<mxnet::op::Alternative>]’:
/usr/include/c++/8/bits/vector.tcc:159:5: note: parameter passing for argument of type ‘std::vector<mxnet::op::Alternative>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’} changed in GCC 7.1
     vector<_Tp, _Alloc>::
     ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(std::vector<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {const mxnet::op::Alternative&}; _Tp = mxnet::op::Alternative; _Alloc = std::allocator<mxnet::op::Alternative>]’:
/usr/include/c++/8/bits/vector.tcc:413:7: note: parameter passing for argument of type ‘std::vector<mxnet::op::Alternative>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’} changed in GCC 7.1
       vector<_Tp, _Alloc>::
       ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/8/vector:64,
                 from /home/pi/mxnet/include/dmlc/registry.h:11,
                 from /home/pi/mxnet/include/mxnet/operator_util.h:37,
                 from /home/pi/mxnet/src/operator/numpy/./np_einsum_op-inl.h:65,
                 from /home/pi/mxnet/src/operator/numpy/np_einsum_op.cc:58:
/usr/include/c++/8/bits/stl_vector.h: In function ‘void mxnet::op::_update_other_results(std::vector<mxnet::op::Alternative>*, const mxnet::op::Alternative&)’:
/usr/include/c++/8/bits/stl_vector.h:1318:58: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’ changed in GCC 7.1
       { return _M_erase(begin() + (__position - cbegin())); }
                                                          ^
/usr/include/c++/8/bits/stl_vector.h: In function ‘std::vector<std::vector<int> > mxnet::op::_greedy_path(const SetVector*, const std::bitset<128>&, const dim_t*, size_t)’:
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’ changed in GCC 7.1
    _M_realloc_insert(end(), __x);
    ^~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’ changed in GCC 7.1
    _M_realloc_insert(end(), __x);
    ^~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::op::Alternative*, std::vector<mxnet::op::Alternative> >’ changed in GCC 7.1
    _M_realloc_insert(end(), __x);
    ^~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/8/vector:69,
                 from /home/pi/mxnet/include/dmlc/registry.h:11,
                 from /home/pi/mxnet/include/mxnet/operator_util.h:37,
                 from /home/pi/mxnet/src/operator/numpy/./np_einsum_op-inl.h:65,
                 from /home/pi/mxnet/src/operator/numpy/np_einsum_op.cc:58:
/usr/include/c++/8/bits/vector.tcc: In member function ‘std::vector<_Tp, _Alloc>::iterator std::vector<_Tp, _Alloc>::_M_erase(std::vector<_Tp, _Alloc>::iterator) [with _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/vector.tcc:159:5: note: parameter passing for argument of type ‘std::vector<mxnet::TBlob>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<mxnet::TBlob*, std::vector<mxnet::TBlob> >’} changed in GCC 7.1
     vector<_Tp, _Alloc>::
     ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(std::vector<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {mxnet::TBlob}; _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/vector.tcc:413:7: note: parameter passing for argument of type ‘std::vector<mxnet::TBlob>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<mxnet::TBlob*, std::vector<mxnet::TBlob> >’} changed in GCC 7.1
       vector<_Tp, _Alloc>::
       ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/vector.tcc: In member function ‘std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::emplace_back(_Args&& ...) [with _Args = {mxnet::TBlob}; _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/vector.tcc:109:4: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::TBlob*, std::vector<mxnet::TBlob> >’ changed in GCC 7.1
    _M_realloc_insert(end(), std::forward<_Args>(__args)...);
    ^~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/vector.tcc: In member function ‘void std::vector<_Tp, _Alloc>::_M_realloc_insert(std::vector<_Tp, _Alloc>::iterator, _Args&& ...) [with _Args = {const mxnet::TBlob&}; _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/vector.tcc:413:7: note: parameter passing for argument of type ‘std::vector<mxnet::TBlob>::iterator’ {aka ‘__gnu_cxx::__normal_iterator<mxnet::TBlob*, std::vector<mxnet::TBlob> >’} changed in GCC 7.1
       vector<_Tp, _Alloc>::
       ^~~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/8/vector:64,
                 from /home/pi/mxnet/include/dmlc/registry.h:11,
                 from /home/pi/mxnet/include/mxnet/operator_util.h:37,
                 from /home/pi/mxnet/src/operator/numpy/./np_einsum_op-inl.h:65,
                 from /home/pi/mxnet/src/operator/numpy/np_einsum_op.cc:58:
/usr/include/c++/8/bits/stl_vector.h: In member function ‘void std::vector<_Tp, _Alloc>::push_back(const value_type&) [with _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/stl_vector.h:1085:4: note: parameter passing for argument of type ‘__gnu_cxx::__normal_iterator<mxnet::TBlob*, std::vector<mxnet::TBlob> >’ changed in GCC 7.1
    _M_realloc_insert(end(), __x);
    ^~~~~~~~~~~~~~~~~
/usr/include/c++/8/bits/stl_vector.h: In function ‘std::vector<_Tp, _Alloc>::vector(std::initializer_list<_Tp>, const allocator_type&) [with _Tp = mxnet::TBlob; _Alloc = std::allocator<mxnet::TBlob>]’:
/usr/include/c++/8/bits/stl_vector.h:515:7: note: parameter passing for argument of type ‘std::initializer_list<mxnet::TBlob>’ changed in GCC 7.1
       vector(initializer_list<value_type> __l,
       ^~~~~~
virtual memory exhausted: Cannot allocate memory
make[2]: *** [CMakeFiles/mxnet.dir/build.make:3670: CMakeFiles/mxnet.dir/src/operator/numpy/np_einsum_op.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1202: CMakeFiles/mxnet.dir/all] Error 2
make: *** [Makefile:160: all] Error 2
Raunak-Singh-Inventor commented 3 years ago

Also, here are some stats:

command: free -h

              total        used        free      shared  buff/cache   available
Mem:          3.7Gi       505Mi       2.9Gi       149Mi       301Mi       3.0Gi
Swap:          99Mi        90Mi       9.0Mi

command: df -h

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        30G   11G   18G  38% /
devtmpfs        1.8G     0  1.8G   0% /dev
tmpfs           1.9G   50M  1.9G   3% /dev/shm
tmpfs           1.9G  8.6M  1.9G   1% /run
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mmcblk0p1  253M   48M  205M  19% /boot
tmpfs           383M   12K  383M   1% /run/user/1000
leezu commented 3 years ago

You may not have enough memory to use parallel build. Try cmake --build . -j 1

Btw, it will be much faster for you to cross-compile libmxnet from your dev desktop, which likely has faster cpu and more memory then the RPi. (Assuming that you'd not be afraid to learn about cross-compile)

Cross-compile build env:

https://github.com/apache/incubator-mxnet/blob/8397422aeb4b09fa464b824bfe769f5fd8cb7969/ci/docker/Dockerfile.build.arm#L130-L150

Cross-compile build command

https://github.com/apache/incubator-mxnet/blob/8397422aeb4b09fa464b824bfe769f5fd8cb7969/ci/docker/runtime_functions.sh#L203-L211

Raunak-Singh-Inventor commented 3 years ago

I am cross-compiling libmxnet from my 64-bit windows desktop. I am setting up the build environment, and am using the debian:buster base image https://hub.docker.com/_/debian. When I build the environment Dockerfile, I get this error:

Step 4/7 : COPY toolchains/aarch64-linux-gnu-toolchain.cmake /usr
COPY failed: stat /var/lib/docker/tmp/docker-builder467877908/toolchains/aarch64-linux-gnu-toolchain.cmake: no such file or directory

Here is my Dockerfile with the base image of debian:buster. I had to change the Dockerfile a bit to get it to run (installing pip, python3, cmake):

FROM debian:buster

RUN apt-get update && apt-get install -y \
python3 python3-pip
RUN python3 -m pip install --user --upgrade "cmake>=3.13.2"

COPY toolchains/aarch64-linux-gnu-toolchain.cmake /usr 
 ENV CMAKE_TOOLCHAIN_FILE=/usr/aarch64-linux-gnu-toolchain.cmake 

 RUN git clone --recursive -b v0.3.12 https://github.com/xianyi/OpenBLAS.git && \ 
     cd /usr/local/OpenBLAS && \ 
     make NOFORTRAN=1 NO_SHARED=1 CC=aarch64-linux-gnu-gcc && \ 
     make PREFIX=/usr/aarch64-linux-gnu NO_SHARED=1 install && \ 
     cd /usr/local && \ 
     rm -rf OpenBLAS 

 RUN git clone --recursive -b v1.2.11 https://github.com/madler/zlib.git && \ 
     cd /usr/local/zlib && \ 
     CHOST=arm \ 
     CC=aarch64-linux-gnu-gcc \ 
     AR=aarch64-linux-gnu-ar \ 
     RANLIB=aarch64-linux-gnu-ranlib \ 
     ./configure --static --prefix=/usr/aarch64-linux-gnu && \ 
     make -j$(nproc) && \ 
     make install && \ 
     cd /usr/local && \ 
     rm -rf zlib 

@leezu Can you give me a complete build env dockerfile? It would be helpful.

leezu commented 3 years ago

@Raunak-Singh-Inventor the complete dockerfile is linked above. It's tested on CI every day. Also note you're not required to use Docker here. I'm just providing it for your reference so you can see the dependencies. Generally you can use any Linux system to do the cross-compilation. I'd discourage the use of Windows.