apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Proper Android build instuctions #20082

Closed RinatVeliakhmedov-TomTom closed 3 years ago

RinatVeliakhmedov-TomTom commented 3 years ago

Description

Hi. As MXNet claims to support Android on its README page, I want to use its C++ API in an Android app to perform training and inference. Unfortunately, I could not build it for Android: it depends on OpenBLAS which has community support for Android, and, as you can expect, its build instructions are either broken or outdated.

I'd like to request proper end-to-end build instructions that would allow me to build, link and use the library in an Android app of multiple ABIs.

Thank you, Rinat.

github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

leezu commented 3 years ago

Hi @RinatVeliakhmedov-TomTom, at this time you can refer to the following dockerfile for an example of the cross-compilation environment on armv7 and armv8, including the OpenBLAS build. Specifically, you'll want to get the NDK and the OpenBLAS built as follows:

https://github.com/apache/incubator-mxnet/blob/5722f8b38af58c5a296e46ca695bfaf7cff85040/ci/docker/Dockerfile.build.android#L21-L90

Once you have setup the environment (no need to use Docker for that, just follow the same steps), you can trigger the MXNet build itself is via

https://github.com/apache/incubator-mxnet/blob/5722f8b38af58c5a296e46ca695bfaf7cff85040/ci/docker/runtime_functions.sh#L244-L255

If you'd like to contribute proper end-to-end build instructions, that would be very welcome. If you run into any issues, please feel free to clarify here.

RinatVeliakhmedov-TomTom commented 3 years ago

Hi @leezu, I already tried to follow those steps from the docker script, but I get an error: blas_server.c:655:16: error: variable has incomplete type 'struct rlimit'

Also, there are only instructions for arm, but not for x86_64 or x86.

leezu commented 3 years ago

blas_server.c:655:16: error: variable has incomplete type 'struct rlimit'

In that case you may not be following the instructions precisely. For example, did you use a separate version of the NDK or OpenBLAS? I'm saying that because the instructions I linked are run for every commit and must pass for a commit to be merged.

Could you provide more details on the issue you met (the error alone is hard to diagnose)?

Also, there are only instructions for arm, but not for x86_64 or x86.

If you'd like to contribute proper end-to-end build instructions including x86_64 and x86, that would be very welcome. It would be completely analogous to the setup I shared above.

RinatVeliakhmedov-TomTom commented 3 years ago

I'm using similar instructions, the only difference is that I'm building from macOS. I tried with ndk 17/19c/20b/21 and got the same result.

UPDATE:

I guess the reason was me building with macOS as a host OS. When built manually from ubuntu 20.04 docker image, it worked fine.

Feel free to close the issue if you think nothing should be done with this.

leezu commented 3 years ago

I guess the reason was me building with macOS as a host OS. When built manually from ubuntu 20.04 docker image, it worked fine.

Great. But in principle things should also work on macOS. Could this be a bug in the ndk?

Feel free to close the issue if you think nothing should be done with this.

I think proper build instructions would still be helpful. For example, adding an Android selector at https://mxnet.apache.org/versions/master/get_started?version=v1.8.0&platform=devices&iot=raspberry-pi& Would you like to contribute a guide based on your experience going through the build?

RinatVeliakhmedov-TomTom commented 3 years ago

I am not sure why it didn't work on macOS. I think I did it exactly the same way, but, of course, I can be wrong, and building from a clean docker vs building from a developer machine is probably bound to have some hidden pitfalls.

If I manage to make it build properly and link it to my app and use the library from the app, I can update the build instructions, if you'd be ok with that, but I have zero experience with website programming so I'm not sure if I should touch the version selector on your website.

leezu commented 3 years ago

but I have zero experience with website programming so I'm not sure if I should touch the version selector on your website.

It's also fine to add another file like https://github.com/apache/incubator-mxnet/blob/master/docs/static_site/src/pages/get_started/jetson_setup.md or add a section in https://github.com/apache/incubator-mxnet/blob/master/docs/static_site/src/pages/get_started/build_from_source.md

The exact location of the content, such as integration with the dropdown, should be easy to change later. Writing the instructions to build properly and link to app and use the library from the app would be a great first step. So don't worry about the dropdown for now and you can just assume a standard markdown document like the ones linked above.

RinatVeliakhmedov-TomTom commented 3 years ago

When running without USE_CPP_PACKAGE, it build just fine, but when I try to build the c++ package, I get an error:

[386/430] cd /Users/veliakhm/nc/mxnet/cpp-package/scripts && echo Running:\ OpWrapperGenerator.py && python OpWrapperGenerator.py /Users/veliakhm/nc/mxnet/build/libmxnet.so
FAILED: cpp-package/CMakeFiles/cpp_package_op_h ../cpp-package/include/mxnet-cpp/op.h cpp-package/MAIN_DEPENDENCY cpp-package/mxnet
cd /Users/veliakhm/nc/mxnet/cpp-package/scripts && echo Running:\ OpWrapperGenerator.py && python OpWrapperGenerator.py /Users/veliakhm/nc/mxnet/build/libmxnet.so
Running: OpWrapperGenerator.py
Traceback (most recent call last):
  File "OpWrapperGenerator.py", line 433, in <module>
    raise(e)
  File "OpWrapperGenerator.py", line 427, in <module>
    f.write(patternStr % ParseAllOps())
  File "OpWrapperGenerator.py", line 321, in ParseAllOps
    cdll.libmxnet = cdll.LoadLibrary(sys.argv[1])
  File "/Users/veliakhm/opt/anaconda3/lib/python3.7/ctypes/__init__.py", line 442, in LoadLibrary
    return self._dlltype(name)
  File "/Users/veliakhm/opt/anaconda3/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/veliakhm/nc/mxnet/build/libmxnet.so, 6): no suitable image found.  Did find:
    /Users/veliakhm/nc/mxnet/build/libmxnet.so: unknown file type, first eight bytes: 0x7F 0x45 0x4C 0x46 0x02 0x01 0x01 0x00
    /Users/veliakhm/nc/mxnet/build/libmxnet.so: stat() failed with errno=25
ninja: build stopped: subcommand failed.

Same error observed when building from ubuntu:20.04 docker, so probably not the macOS fault this time.

leezu commented 3 years ago

The problem here is that OpWrapperGenerator.py needs to dlopen the libmxnet.so to generate C++ interface code. However, as you cross-compiled the latter, this fails.

To solve the problem (if you are on linux), you could use QEMU with binfmt. As an example how QEMU can be applied, on our CI we use the following arm docker container that runs on x86_64 via qemu:

https://github.com/apache/incubator-mxnet/blob/c7a8ccc7220d0d710e5274d075c4ce4f55c81c37/ci/docker/docker-compose.yml#L148-L168

https://github.com/apache/incubator-mxnet/blob/c7a8ccc7220d0d710e5274d075c4ce4f55c81c37/ci/docker/Dockerfile.test.arm#L19-L43

To solve your issue, you may need to setup your system so that

https://github.com/apache/incubator-mxnet/blob/833cb89e3e7a5262151a3b512d18a82d6de917be/cpp-package/scripts/OpWrapperGenerator.py#L321

succeeds through QEMU (this should be transparently handled by the OS if you setup QEMU binfmt correctly).

Another option is to run the whole compilation on the target architecture, via the QEMU based container mentioned above.

Thirdly, if you have other ideas to make https://github.com/apache/incubator-mxnet/blob/833cb89e3e7a5262151a3b512d18a82d6de917be/cpp-package/scripts/OpWrapperGenerator.py#L321 architecture independent that would also be welcome

leezu commented 3 years ago

Also note that the cpp-package folder was removed in the master branch as it relied on deprecated APIs. Another option is to directly interface with the APIs in https://github.com/apache/incubator-mxnet/tree/master/include/mxnet There is also a call for contribution to add back a refactored cpp-package in the master branch.