Open marcoabreu opened 6 years ago
@KellenSunderland @larroy @lebeg
For reference, a working build with exactly the same Dockerfile: http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/ci-master/runs/306/nodes/67/steps/447/log/?start=0
Step 8/19 : ADD https://api.github.com/repos/xianyi/OpenBLAS/git/refs/tags/v0.2.20 openblas_version.json
---> 96b952173698
Step 9/19 : RUN git clone --recursive -b v0.2.20 https://github.com/xianyi/OpenBLAS.git && cd OpenBLAS && make -j$(nproc) && PREFIX=${CROSS_ROOT} make install
---> Running in d963aa7a538a
[91mCloning into 'OpenBLAS'...
[0m[91mNote: checking out '5dde4e65d321076582a2fafe16949d2160551e81'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b new_branch_name
[0m[91mmake[1]: warning: -jN forced in submake: disabling jobserver mode.
[0mmake[1]: Entering directory '/work/OpenBLAS/interface'
/usr/bin/aarch64-linux-gnu-gcc -O2 -DMAX_STACK_ALLOC=2048 -Wall -DF_INTERFACE_GFORT -fPIC -DSMP_SERVER -DNO_WARMUP -DMAX_CPU_NUMBER=72 -march=armv8-a -DASMNAME=saxpy -DASMFNAME=saxpy_ -DNAME=saxpy_ -DCNAME=saxpy -DCHAR_NAME=\"saxpy_\" -DCHAR_CNAME=\"saxpy\" -DNO_AFFINITY -I.. -I. -UDOUBLE -UCOMPLEX -c axpy.c -o saxpy.o
Most noticeable here is the 'unknown-linux' part: aarch64-unknown-linux-gnueabi
<-> aarch64-linux-gnu-gcc
Problem seems to be the missing gfortran
compiler which we're setting manually in our Dockerfile. This has to be updated to match whatever is present inside the new Dockerfile.
Unfortunately, the project does not maintain any tags for their Dockerfiles, so there's no way around fixing the problem instead of pinning the version. https://microbadger.com/images/dockcross/linux-arm64
Hey @marcoabreu, how do we currently track the failed builds on the test environment? Do we have nightly runs that build with --no-cache and catch these issues?
We don't track failures on the test environment since it's just for testing of new features for CI. The fact that this gave some useful information was just an unintended side-effect.
We don't have any nightly runs that work without cache. That would definitely be a good feature. I have added a feature request at https://github.com/apache/incubator-mxnet/issues/10839
I was able to reproduce the issue. The problem seems to be caused because of the missing fortran compiler whose installation step seems to be removed as part of dockcross PR. Do we need to build openblas with LAPACK support. The workaround to this issue can be to skip LAPACK support installation for now by removing the FC line from the Dockerfile ?
We had the discussion whether anybody is using lapack on arm devices and the result was - yes, this is needed.
Can you give more info on why lapack is needed in arm?
To enable efficient SVD, matrix decomposition, etc. Maybe more relevant for edge devices than gpus.
On Thu, Jul 5, 2018 at 6:19 PM Pedro Larroy notifications@github.com wrote:
Can you give more info on why lapack is needed in arm?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/10837#issuecomment-402776871, or mute the thread https://github.com/notifications/unsubscribe-auth/AHGTE8LkTKJ1g5JrZTZERGn9BFHI9O8Cks5uDjyTgaJpZM4T1evs .
http://jenkins.mxnet-ci-dev.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/ci-master/392/pipeline/67/
Note: This is the test environment.
In order to reproduce locally, just run
docker build -f docker/Dockerfile.build.jetson --build-arg USER_ID=1001 -t mxnet/build.jetson docker
.