[RFC] MXNet AArch64 wheels

mseth10 commented 3 years ago

Problem statement

Currently, Apache MXNet does not publish wheels for AArch64 based platforms. I would like to propose addition of AArch64 support to our CI/CD as well as stable release Jenkins pipelines. MXNet already supports AArch64 based platforms. In CI, we cross-compile MXNet for AArch64 target architectures for Ubuntu and Android OS. For wheel generation and testing, we can use AWS Graviton2 processors powered Amazon EC2 instances and use native-compilation toolchain. For best performance of wheels, we can evaluate different build options and use the best possible configuration or provide different options for our users to choose from. Some of the different build options include choice of BLAS (OpenBLAS, Eigen BLAS, Arm Performance Libraries), choice of performance libraries (OneDNN, Arm Compute Libraries, XNNPACK) and different build flag setting (-march, -mtune, -mcpu, -moutline-atomics) [1][2] .

Proposed solutions

I have been able to build and test MXNet v1.x branch with OpenBLAS and OneDNN. The binary is supported on all AArch64 architectures (ARMv8-A, ARMv8.1-A, ARMv8.2-A, ...), but in order to make use of Large System Extensions introduced in ARMv8.1-A, I had to build using GCC flag -moutline-atomics which is supported in gcc-10 only. Using this build in CD pipelines would mean using a base docker image that supports gcc-10 (currently using Ubuntu:18.04). We can get rid of the flag -moutline-atomics (and gcc-10 dependency) if we build for base architecture ARMv8.1-A (-march=armv8.1-a), but then the binary won’t execute on ARMv8-A based platforms. We can also optimize the build for a particular micro-architecture by using other build flags like -mtune / -mcpu . Any suggestions are appreciated. Arm has added experimental support for Arm Performance Libraries and Arm Compute Libraries into OneDNN [3]. Next step would be to evaluate this support and enable it in MXNet.

References

mseth10 commented 3 years ago

WIP PR : https://github.com/apache/incubator-mxnet/pull/20252

ddelange commented 1 year ago

hi @mseth10,

fkrst of all, thanks a lot for the above PR. could you give an indication of complexity to cherry-pick these aarch64 builds onto the cuda wheels?

would be cool to run GPU inference on g5.xlarge

ddelange commented 1 year ago

first hickup ^ libquadmath0 is missing arm64

apache / mxnet