apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

[Julia CI pipeline] reduce complexity and deps #16306

Open aaronmarkham opened 5 years ago

aaronmarkham commented 5 years ago

Description

For the other language bindings, I was able to reduce the dependencies down quite a lot by using the "lite" dockerfile as a starting point and layer on the minimum dependencies, but for some reason, every time I tried that with Julia I'd end up with a broken build when Julia tried to load the MXNet binary. So currently it uses what's essentially the fully-loaded ubuntu_cpu dockerfile, which will install things like clojure and scala, when it really doesn't need that.

The issue here is that if anything goes wrong with the fully-loaded build pipeline, the Julia build will break and snowball into breaking the rest of the website publishing. For example, we're experience an issue with MKL right now, and since Julia docs installs that too, Julia docs breaks, and brings the whole website with it.

Expected results

I want to see the julia dockerfile slimmed down to the bare minimum dependencies, to reduce backsplash from unrelated problems.

How to reproduce

You can run into the build problem following these steps.

  1. Run a "lite" binary build (instead of the fully-loaded one).

    ci/build.py --docker-registry mxnetci --platform ubuntu_cpu_lite /work/runtime_functions.sh build_ubuntu_cpu_docs
  2. Run the Julia docs build:

    ci/build.py --docker-registry mxnetci --platform ubuntu_cpu_julia /work/runtime_functions.sh build_julia_docs
mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended label(s): CI, Build

iblislin commented 5 years ago

(cc me)

iblislin commented 5 years ago

I tried to run a "lite" binary build

make: *** No rule to make target '/home/iblis/git/mxnet/3rdparty/dmlc-core/include/dmlc/registry.h', needed by 'build/src/operator/tensor/elemwise_binary_broadcast_op_basic.o'.  Stop.                                                                       
make: *** Waiting for unfinished jobs....                                                                                      
build.py: 2019-09-28 11:06:51,320Z INFO Waiting for status of container deff28a1d234 for 600 s.                                
build.py: 2019-09-28 11:06:51,517Z INFO Container exit status: {'Error': None, 'StatusCode': 2}                                
build.py: 2019-09-28 11:06:51,517Z ERROR Container exited with an error 😞                                                     
build.py: 2019-09-28 11:06:51,517Z INFO Executed command for reproduction:                                                     

ci/build.py --docker-registry mxnetci --platform ubuntu_cpu_lite /work/runtime_functions.sh build_ubuntu_cpu_docs              

build.py: 2019-09-28 11:06:51,517Z INFO Stopping container: deff28a1d234                                                       
build.py: 2019-09-28 11:06:51,520Z INFO Removing container: deff28a1d234                                                       

Any idea about this error on master? I think it's unrelated to both Julia and MKL. The submodule is updated on my box already.

aaronmarkham commented 5 years ago

I just fetched upstream/master and ran that command and didn't get an error. I've found that when I've had a bunch of branches around and switch around a lot I have to run make clean and update the submodules to get in a buildable state. Something happened with the submodules recently and I had to delete all of them and check them out again. Starting fresh in a new directory works too, but that's a last resort.

iblislin commented 5 years ago

Starting fresh in a new directory works too, but that's a last resort.

ah, git-worktree is the lifesaver in this case. :p

The example from its man page is exactly fit into this case.

On 9/30/19 11:09 PM, Aaron Markham wrote:

I just fetched upstream/master and ran that command and didn't get an error. I've found that when I've had a bunch of branches around and switch around a lot I have to run |make clean| and update the submodules to get in a buildable state. Something happened with the submodules recently and I had to delete all of them and check them out again. Starting fresh in a new directory works too, but that's a last resort.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/16306?email_source=notifications&email_token=AAFZ6FZQWFU6EHEFHTASCBDQMIJBHA5CNFSM4I3LR24KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD757OFY#issuecomment-536606487, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFZ6F3VHC2RGGGWO7JUVN3QMIJBHANCNFSM4I3LR24A.