apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Cross-platform docker build fails on Mac, won't work on Windows (?) #13759

Open matthewberryman opened 5 years ago

matthewberryman commented 5 years ago

Description

Cross-platform docker build fails on macOS and quite possibly on Windows

Environment info (Required)

----------Python Info---------- Version : 3.7.2 Compiler : Clang 10.0.0 (clang-1000.11.45.5) Build : ('default', 'Dec 27 2018 07:35:06') Arch : ('64bit', '') ------------Pip Info----------- Version : 18.1 Directory : /usr/local/lib/python3.7/site-packages/pip ----------MXNet Info----------- No MXNet installed. ----------System Info---------- Platform : Darwin-18.2.0-x86_64-i386-64bit system : Darwin node : demeter.lan release : 18.2.0 version : Darwin Kernel Version 18.2.0: Fri Dec 14 18:43:36 PST 2018; root:xnu-4903.240.10~4/RELEASE_X86_64 ----------Hardware Info---------- machine : x86_64 processor : i386 b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz' b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C' b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 AVX2 BMI2 INVPCID SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT' b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI' ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0078 sec, LOAD: 1.7442 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.0006 sec, LOAD: 0.7151 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0008 sec, LOAD: 0.8128 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0007 sec, LOAD: 1.0083 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0117 sec, LOAD: 0.7682 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0008 sec, LOAD: 0.2685 sec.

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio): Whatever is in the docker container.

MXNet commit hash: https://github.com/apache/incubator-mxnet/commit/d7f9a075e4bc86158944a1af1071fabe40da6498

Build command:

ci/build.py -p armv7

Error Message:

Step 17/19 : RUN /work/ubuntu_adduser.sh
 ---> Running in 68f227b99190
+ [[ 501 -gt 0 ]]
+ [[ -n 20 ]]
+ [[ 20 -gt 0 ]]
+ groupadd --gid 20 --system jenkins_slave
groupadd: GID '20' already exists
The command '/bin/sh -c /work/ubuntu_adduser.sh' returned a non-zero code: 4
Traceback (most recent call last):
  File "ci/build.py", line 571, in <module>
    sys.exit(main())
  File "ci/build.py", line 485, in main
    num_retries=args.docker_build_retries, no_cache=args.no_cache)
  File "ci/build.py", line 160, in build_docker
    run_cmd()
  File "/Users/matthew/code/incubator-mxnet/ci/util.py", line 78, in f_retry
    return f(*args, **kwargs)
  File "ci/build.py", line 158, in run_cmd
    check_call(cmd)
  File "/usr/local/Cellar/python/3.7.2/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'build', '-f', 'docker/Dockerfile.build.armv7', '--build-arg', 'USER_ID=501', '--build-arg', 'GROUP_ID=20', '--cache-from', 'mxnetci/build.armv7', '-t', 'mxnetci/build.armv7', 'docker']' returned non-zero exit status 4.

Steps to reproduce

(Paste the commands you ran that produced the error.)

ci/build.py -p armv7

What have you tried to solve it?

Identified the cause of the problem: On a Mac, docker containers runs in a VM rather than cgroups etc. directly on the host, so the UID and GID passed in from https://github.com/apache/incubator-mxnet/blob/d7f9a075e4bc86158944a1af1071fabe40da6498/ci/build.py#L147 are what's available on the Mac, not what is available within the Docker host VM (as then passed in to the container).

I am guessing Docker for Windows suffers from a related issue in that there are no os.getuid() (ref https://github.com/Parsely/streamparse/issues/415 ) and os.getgid() calls available.

mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: Build

leleamol commented 5 years ago

@mxnet-label-bot add [Build]