Closed hermandr closed 4 years ago
I'm tagging @lucazav since he authored this article.
Hi @hermandr ,
that article was written few months ago. It could be that the current R SDK version is not compatible with the old Python SDK version 1.10.0. According to this page, the latest Python SDK version is the 1.16.0 one. Try to replace your Dockerfile's last row with the following one:
RUN R -e "azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)"
and then re-build your Docker image.
@hermandr as the Dockerfile you are testing doesn't have any particular package to be installed, I suppose your code works fine with the default Docker image. Isn't it?
@hermandr as the Dockerfile you are testing doesn't have any particular package to be installed, I suppose your code works fine with the default Docker image. Isn't it?
log_metric does not work. Please try to create v4 and re-build your image with latest versions of azureml and test to see if you get same result as mine. I believe when I use "FROM..." it just uses the pre-built image of your v3 build. But when I build with same Dockerfile, it failed (log_metric) and probably due to changes in the azureml sdk since your build date.
Herman
Hi @hermandr ,
that article was written few months ago. It could be that the current R SDK version is not compatible with the old Python SDK version 1.10.0. According to this page, the latest Python SDK version is the 1.16.0 one. Try to replace your Dockerfile's last row with the following one:
RUN R -e "azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)"
and then re-build your Docker image.
@lucazav Sorry I missed this post. Let me try with sdk v1.16.0. Let me post the update on the result here.
Herman
Hi @lucazav,
I can confirm that with v1.16.0 error still occurs
Log metrics on azureml
Error in py_get_attr_impl(x, name, silent) :
AttributeError: module 'azureml' has no attribute 'core'
Calls: log_metric_to_run ... py_get_attr_or_item -> py_get_attr -> py_get_attr_impl
Execution halted
2020/10/22 14:17:59 logger.go:297: Failed to run the wrapper cmd with err: exit status 1
2020/10/22 14:17:59 logger.go:297: Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/status
2020/10/22 14:17:59 sysutils_linux.go:221: mpirun version string: {
mpirun (Open MPI) 3.1.2
Report bugs to http://www.open-mpi.org/community/help/
}
Hi @hermandr,
if so, it'd be a bug of the latest release. The R SDK PM told me there are a bunch of bugs in the 1.10.0 version released on CRAN to be fixed.
@diondrapeck could you investigate on this bug, please? I think it doesn't depend on the custom Docker image, as it simply install the latest versions of both the SDKs.
Thank you.
@hermandr could you please install the latest version of SDKs on your Compute Instance using RStudio in this way:
remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')
azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)
and then try to run your code using the default Docker image? If it still fails, it's a confirmation that the bug is in the SDKs.
Thank you.
Created a new compute instance
Virtual machine size
STANDARD_D2_V3 (2 Cores, 8 GB RAM, 50 GB Disk)
Processing Unit
CPU - General purpose
In compute instance RStudio check default versions of azureml sdk
RStudio install latest versions of azureml sdk for R and Python
remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')
azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)
Check versions of sdk in R and Python after installation
Upload minimal estimator script to run only log_metric Submit experiment code:
library(azuremlsdk)
setwd("~/cloudfiles/code/Users/oratsl")
sp_auth <- service_principal_authentication( tenant_id = Sys.getenv("TENANT_ID"), service_principal_id = Sys.getenv("SERVICE_PRINCIPAL_ID"), service_principal_password = Sys.getenv("SERVICE_PRINCIPAL_PASSWORD") )
ws <- get_workspace( "hermanml", subscription_id = "1dbf72ea-fdeb-46cb-a58f-b873e8f2ae4e", resource_group = "Machine-Learning", auth = sp_auth )
cluster_name <- "ml-compute" compute_target <- get_compute(ws, cluster_name = cluster_name) if(is.null(compute_target)) stop("Training cluster not found")
exp <- experiment(ws, "minimal")
est_minimal <- estimator(source_directory="minimal", entry_script = "minimal_estimator.R", script_params = list("--note" = "hermantansg/r-sdk-docker-img:default", "--instance" = "cloud"), compute_target = compute_target)
run <- submit_experiment(exp, est_minimal)
Estimator code:
message("libs") library(azuremlsdk) library(optparse)
library(dplyr) library(purrr) library(tidyr)
library(reticulate) message("Check python config") tibble(p = list(py_discover_config())) %>% mutate(python=map_chr(p,"python"), libpython=map_chr(p,"libpython"), pythonhome=map_chr(p,"pythonhome"), virtualenv=map_chr(p,"virtualenv"), virtualenv_activate=map_chr(p,"virtualenv_activate"), version_string=map_chr(p,"version_string"), version=map_chr(p,"version"), architecture=map_chr(p,"architecture"), annaconda=map_lgl(p,"anaconda"), numpy=map(p,"numpy"), numpy=map_chr(numpy,"path"), python_versions=map(p,"python_versions"), python_versions = map_chr(python_versions,~paste(.x,collapse=":")) ) %>% select(-p) %>% gather(key="py_config_parameter", value="value") %>% unite(s, py_config_parameter,value,sep=": ", remove = TRUE) %>% as.matrix() %>% write(.,stderr())
message("Check environments") conda_list() message("Check if azureml is accessible") py_module_available("azureml") message("Check if azureml.core is accessible") py_module_available("azureml.core")
message("List of python modules and versions") system("pip list")
###################################
message("optparse add options") options <- list( make_option(c("-n", "--note"), action="store", dest="note",default="No notes", help="Note on submit"), make_option(c("-i", "--instance"), action="store", dest="instance", default="local", help="Location of compute instance local or azureml") )
message("OptionParser") opt_parser <- OptionParser(option_list = options) opt <- parse_args(opt_parser)
message("Submit note:", opt$note)
if (opt$instance != "local") { message("Log metrics on azureml") log_metric_to_run("Method","GLM") }
message("End of run")
message("Session Info") sessionInfo()
[minimal-default.zip](https://github.com/Azure/azureml-sdk-for-r/files/5424990/minimal-default.zip)
6. Run the submit code
7. Wait for estimator to complete and log results:
No error, successful run
[70_driver_log (1).txt](https://github.com/Azure/azureml-sdk-for-r/files/5425009/70_driver_log.1.txt)
Docker build log:
[20_image_build_log.txt](https://github.com/Azure/azureml-sdk-for-r/files/5425012/20_image_build_log.txt)
*Summary:*
1. Submit code is running latest version of azureml sdk on python and R in compute instance RStudio
2. Compute cluster running the default image runs successfully
@lucavaz your Dockerfile based on v3 and I changed to azuremlsdk v1.16:
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu16.04
RUN conda install -c r -y \
r-essentials=3.6.0 \
r-reticulate \
rpy2 \
r-remotes \
r-rodbc \
r-e1071 \
r-optparse && \
conda clean -ay && pip install --no-cache-dir azureml-defaults
RUN apt-get update && apt-get install -y \
tzdata \
zlib1g-dev && \
apt-get clean
ENV TAR="/bin/tar"
# Set default locale
ENV LANG C.UTF-8
# Set default timezone
ENV TZ UTC
RUN R -e "remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')"
RUN R -e "azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)"
From 20_image_build_log.txt of default Docker build, I reconstruct the dockerfile
FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu16.04@sha256:8bc7ffc7142fb2914e40e8d64fed7bb89f7d087b670c0cb3168d241a5e908e98
USER root
RUN mkdir -p $HOME/.cache
WORKDIR /
COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/
RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi
COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml
RUN ldconfig /usr/local/cuda/lib64/stubs && \
conda env create -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -f azureml-environment-setup/mutated_conda_dependencies.yml && \
rm -rf "$HOME/.cache/pip" && \
conda clean -aqy && \
CONDA_ROOT_DIR=$(conda info --root) && \
rm -rf "$CONDA_ROOT_DIR/pkgs" && \
find "$CONDA_ROOT_DIR" -type d -name __pycache__ -exec rm -rf {} + && \
ldconfig
ENV PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/bin:$PATH
ENV AZUREML_CONDA_ENVIRONMENT_PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad
ENV LD_LIBRARY_PATH /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/lib:$LD_LIBRARY_PATH
RUN conda install -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -c r -y \
r-essentials=3.6.0 \
rpy2 \
r-checkpoint && \
pip install --no-cache-dir azureml-defaults
ENV TAR="/bin/tar"
RUN R -e "library(checkpoint); \
snapshot_date <- tail(checkpoint::getValidSnapshots(), n = 1); \
setSnapshot(snapshot_date); \
install.packages(c('reticulate', 'remotes', 'e1071', 'optparse')); \
library(remotes); \
remotes::install_cran('azuremlsdk', upgrade = FALSE);"
COPY azureml-environment-setup/spark_cache.py azureml-environment-setup/log4j.properties /azureml-environment-setup/
RUN if [ $SPARK_HOME ]; then /bin/bash -c '$SPARK_HOME/bin/spark-submit /azureml-environment-setup/spark_cache.py'; fi
ENV AZUREML_ENVIRONMENT_IMAGE True
CMD ["bash"]
Running docker build using this Dockerfile failed
\r-sdk-docker-image>docker build -t azureuser/r-sdk-docker-img .
[+] Building 0.1s (24/25)
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu16.04@sha256:8bc7ffc7142fb2914e40e8d64fed7bb89f7d087b670c0cb3168d241a5e908e98 0.0s
=> [1/22] FROM mcr.microsoft.com/azureml/base:openmpi3.1.2-ubuntu16.04@sha256:8bc7ffc7142fb2914e40e8d64fed7bb89f7d087b670c0cb3168d241a5e908e98 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 2B 0.0s
=> CACHED [2/22] RUN mkdir -p $HOME/.cache 0.0s
=> ERROR [3/22] COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/ 0.0s
=> CACHED [4/22] RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.11; then conda install conda==4.4.11; fi 0.0s
=> ERROR [5/22] COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml 0.0s
=> CACHED [6/22] RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad -f azureml-environment-setup/mutated_conda_dependencies.yml && rm -rf "$HOME/.cache/pip" 0.0s
=> CACHED [7/22] RUN conda install -c r -y r-essentials=3.6.0 rpy2 r-checkpoint r-remotes r-rodbc r-e1071 r-reticulate r-optparse && conda clean -ay && pip install --no-cache-dir azureml-defaults 0.0s
=> CACHED [8/22] RUN apt-get update && apt-get install -y tzdata zlib1g-dev && apt-get clean 0.0s
=> CACHED [9/22] RUN R -e "remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')" 0.0s
=> CACHED [10/22] RUN R -e "azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)" 0.0s
=> CACHED [11/22] RUN R -e "library(checkpoint); snapshot_date <- tail(checkpoint::getValidSnapshots(), n = 1); setSnapshot(snapshot_date)" 0.0s
=> CACHED [12/22] RUN R -e "remotes::install_github('https://github.com/Azure/azureml-sdk-for-r')" 0.0s
=> CACHED [13/22] RUN R -e "azuremlsdk::install_azureml(version = '1.16.0', remove_existing_env = TRUE)" 0.0s
=> CACHED [14/22] RUN apt-get install -y pkg-config 0.0s
=> CACHED [15/22] RUN R -e "install.packages('data.table', repos='http://cran.rstudio.com/')" 0.0s
=> CACHED [16/22] RUN R -e "install.packages('xgboost', version='0.82.0.1', repos='http://cran.rstudio.com/')" 0.0s
=> CACHED [17/22] RUN R -e "install.packages('tidyverse', repos='http://cran.rstudio.com/')" 0.0s
=> CACHED [18/22] RUN R -e "install.packages('caret', repos='http://cran.rstudio.com/')" 0.0s
=> CACHED [19/22] RUN R -e "install.packages('ggfortify', repos='http://cran.rstudio.com/')" 0.0s
=> ERROR [20/22] COPY azureml-environment-setup/spark_cache.py azureml-environment-setup/log4j.properties /azureml-environment-setup/ 0.0s
------
> [3/22] COPY azureml-environment-setup/99brokenproxy /etc/apt/apt.conf.d/:
------
------
> [5/22] COPY azureml-environment-setup/mutated_conda_dependencies.yml azureml-environment-setup/mutated_conda_dependencies.yml:
------
------
> [20/22] COPY azureml-environment-setup/spark_cache.py azureml-environment-setup/log4j.properties /azureml-environment-setup/:
------
failed to solve with frontend dockerfile.v0: failed to build LLB: failed to compute cache key: "/azureml-environment-setup/log4j.properties" not found: not found
The main reason this failed is because the /azureml-environment-setup folder is missing
Some how the default image is building from an image that has this folder with some environment setup files to set up the environment.
The 2 dockerfiles above have the same azuremlsdk in R and python versions but they have different environments.
The default image has the conda environment:
Check python config
python: /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/bin/python3
libpython: /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/lib/libpython3.6m.so
pythonhome: /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad:/azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad
virtualenv:
virtualenv_activate:
version_string: 3.6.10 |Anaconda, Inc.| (default, Mar 23 2020, 23:13:11) [GCC 7.3.0]
version: 3.6
architecture: 64bit
annaconda: TRUE
numpy: /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/lib/python3.6/site-packages/numpy
python_versions: /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/bin/python3:/usr/bin/python3
Check environments
name python
1 azureml_da3e97fcb51801118b8e80207f3e01ad /azureml-envs/azureml_da3e97fcb51801118b8e80207f3e01ad/bin/python
Check if azureml is accessible
[1] TRUE
Check if azureml.core is accessible
[1] TRUE
The custom image has the conda environment:
Check python config
python: /opt/miniconda/bin/python3
libpython: /opt/miniconda/lib/libpython3.7m.so
pythonhome: /opt/miniconda:/opt/miniconda
virtualenv:
virtualenv_activate:
version_string: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0]
version: 3.7
architecture: 64bit
annaconda: FALSE
numpy: /opt/miniconda/lib/python3.7/site-packages/numpy
python_versions: /opt/miniconda/bin/python3:/usr/bin/python3
name python
1 r-reticulate /opt/miniconda/envs/r-reticulate/bin/python
Check if azureml is accessible
[1] TRUE
Check if azureml.core is accessible
[1] FALSE
UPDATE:
I created code/azureml-environment-setup
folder in the compute instance.
I extracted the 3 files required by this docker file from the built image in the compute cluster and copied the files to the outputs folder in the train script.
system("cp /azureml-environment-setup/* ./outputs")
system("cp /etc/apt/apt.conf.d/* ./outputs")
I ran build docker using the dockerfile for the default compute cluster.
It works!
Thanks for sharing your solution, @hermandr!
Describe the bug Replicating the docker image from this article How to create custom docker base images for azure machine learning environments produces errors.
No error during docker build.
Based on the accidents example on the Vignettes, when I ran R code on the cluster I encountered this error
R code snippet where error occured
Error output
When I use the Dockerfile:
No errors were encountered.
It is important for my client that the image is taken from the enterprise private ACR. This seems to be an older issue appearing again.
To Reproduce Steps to reproduce the behavior:
Expected behavior The copy of the Dockerfile should work for log_metric()
Screenshots If applicable, add screenshots to help explain your problem. None
Additional context Add any other context about the problem here. Vignette experiments-deep-dive.Rmd