Azure / azureml-sdk-for-r

Azure Machine Learning SDK for R
https://azure.github.io/azureml-sdk-for-r/
Other
105 stars 40 forks source link

Run fails when trying to build base Docker image #409

Closed adfi closed 3 years ago

adfi commented 3 years ago

Describe the bug When running the code from train-on-local and from the tutorial vignette the process repeatedly fails at the point of building the base Docker image.

I have tried with and without specifying the R environment. To me it looks like the Dockerfile has an error.

To Reproduce Run the train-on-local code.

Expected behavior A successful run.

Additional context This is the logging:

[2021-01-21T17:21:58.004378] Entering context manager injector. [2021-01-21T17:21:58.726307] Using urllib.request Python 3.0 or later Streaming log file azureml-logs/60_control_log.txt Starting the daemon thread to refresh tokens in background for process with pid = 3211 Running: ['/bin/bash', '/tmp/azureml_runs/train-r-script-on-local_1611249715_0deb3f7c/azureml-environment-setup/docker_env_checker.sh']

Materialized image not found on target: azureml/azureml_8a19dc11dc12993db888053c941b9e81

[2021-01-21T17:21:59.010399] Logging experiment preparation status in history service. Running: ['/bin/bash', '/tmp/azureml_runs/train-r-script-on-local_1611249715_0deb3f7c/azureml-environment-setup/docker_env_builder.sh'] Running: ['nvidia-docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '-t', 'azureml/azureml_8a19dc11dc12993db888053c941b9e81', '.'] Sending build context to Docker daemon 614.9kB Step 1/20 : FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu16.04@sha256:8bc7ffc7142fb2914e40e8d64fed7bb89f7d087b670c0cb3168d241a5e908e98 ---> 6ac3db102f47 Step 2/20 : USER root ---> Using cache ---> d642cd4391f3 Step 3/20 : RUN mkdir -p $HOME/.cache ---> Using cache ---> 22dc77554689 Step 4/20 : WORKDIR / ---> Using cache ---> 4b7787d5cdac Step 5/20 : COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/ ---> Using cache ---> 50d97c4e873a Step 6/20 : RUN if dpkg --compare-versions conda --version | grep -oE '[^ ]+$' lt 4.4.11; then conda install conda==4.4.11; fi ---> Using cache ---> 2e7d0679337b Step 7/20 : COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml ---> Using cache ---> d3144c703504 Step 8/20 : RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" && conda clean -aqy && CONDA_ROOT_DIR=$(conda info --root) && rm -rf "$CONDA_ROOT_DIR/pkgs" && find "$CONDA_ROOT_DIR" -type d -name pycache -exec rm -rf {} + && ldconfig ---> Using cache ---> 9cfe0691a7fa Step 9/20 : ENV PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/bin:$PATH ---> Using cache ---> b1cb2a721c71 Step 10/20 : ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad ---> Using cache ---> c5afd7a9c6f4 Step 11/20 : ENV LD_LIBRARY_PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/lib:$LD_LIBRARY_PATH ---> Using cache ---> 9181591764d8 Step 12/20 : RUN conda install -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -c r -y pip<=20.1.1 ---> Running in 2907e9bdbe89 /bin/sh: 1: cannot open =20.1.1: No such file The command '/bin/sh -c conda install -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -c r -y pip<=20.1.1' returned a non-zero code: 2 

CalledProcessError(2, ['nvidia-docker', 'build', '-f', 'azureml-environment-setup/Dockerfile', '-t', 'azureml/azureml_8a19dc11dc12993db888053c941b9e81', '.'])

Building docker image failed with exit code: 2

[2021-01-21T17:22:00.910166] Logging error in history service: Failed to run ['/bin/bash', '/tmp/azureml_runs/train-r-script-on-local_1611249715_0deb3f7c/azureml-environment-setup/docker_env_builder.sh'] Exit code 1 Details can be found in azureml-logs/60_control_log.txt log file.

Uploading control log..

And my session info:

R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.7 LTS

Matrix products: default BLAS/LAPACK: /opt/intel/compilers_and_libraries_2018.2.199/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] azuremlsdk_1.10.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.6 lattice_0.20-41 digest_0.6.27 rappdirs_0.3.1 R6_2.5.0 grid_3.6.3
[7] jsonlite_1.7.2 magrittr_2.0.1 rlang_0.4.10 DT_0.16 Matrix_1.3-0 reticulate_1.18-9004 [13] tools_3.6.3 htmlwidgets_1.5.3 crosstalk_1.1.0.1 yaml_2.2.1 xfun_0.19 compiler_3.6.3
[19] htmltools_0.5.0 knitr_1.30

adfi commented 3 years ago

I forgot to mention that I specified the R environment in two ways:

  1. By creating an almost blank r environment: env <- r_environment("name")

  2. By creating a custom docker image as per the vignette: env <- r_environment("your-env-name", custom_docker_image = "<repository_name>.azurecr.io/<image_name>:<tag>")

The first method resulted in the same error but the second method did not and returned a successful run.

dareneiri commented 3 years ago

I am running into the same error, which makes me feel a bit better because this was working just a month ago when I ran the example code.

Specifically, I receive the same output error:

Step 12/20 : RUN conda install -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -c r -y pip<=20.1.1
---> Running in 2907e9bdbe89
�[91m/bin/sh: 1: cannot open =20.1.1: No such file

I have not tried to create a custom Docker image.

diondrapeck commented 3 years ago

@adfi and @dareneiri - This was the result of a regression. It has been fixed.