Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.11k stars 2.52k forks source link

Unable to download FileDataset: MemoryError #1143

Closed casparjespersen closed 4 years ago

casparjespersen commented 4 years ago

Trying to download a (file) dataset on my local development environment:

from azureml.core import Workspace, Experiment, ScriptRunConfig, Dataset

ws = Workspace.get(...)
data = Dataset.get_by_name(ws, f"mydata")

I am able to instantiate the data variable without problems, but when I try to data.download() (or even print(data)) I get a MemoryError. I hardly doubt this is actually due to memory issues. I have 8 GB free memory and the dataset is around 5 MB.

Environment is a remote devcontainer (WSL2-based) on VSCode.

Error message

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/workspaces/fixme/core/train.py in 
----> 1 ds.download()

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
    124             with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
    125                 try:
--> 126                     return func(*args, **kwargs)
    127                 except Exception as e:
    128                     if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/data/file_dataset.py in download(self, target_path, overwrite)
    124         target_path = _ensure_path(target_path)
    125         download_list = [os.path.abspath(os.path.join(target_path, '.' + p))
--> 126                          for p in self._to_path(activity='download.to_path')]
    127         if not overwrite:
    128             for p in download_list:

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/data/file_dataset.py in _to_path(self, activity)
     99 
    100     def _to_path(self, activity):
--> 101         dataflow, portable_path = _add_portable_path_column(self._dataflow)
    102         dataflow = get_dataflow_for_execution(dataflow, activity, 'FileDataset')
    103         records = dataflow._to_pyrecords()

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
    124             with _LoggerFactory.track_activity(logger, func.__name__, activity_type, custom_dimensions) as al:
    125                 try:
--> 126                     return func(*args, **kwargs)
    127                 except Exception as e:
    128                     if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/data/abstract_dataset.py in _dataflow(self)
    205             raise UserErrorException('Dataset definition is missing. Please check how the dataset is created.')
    206         if self._registration and self._registration.workspace:
--> 207             dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
    208         if not isinstance(self._definition, dataprep().Dataflow):
    209             try:

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/_datastore_helper.py in _set_auth_type(workspace)
    136             'password': workspace._auth._service_principal_password
    137         }
--> 138         get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.SERVICEPRINCIPAL, json.dumps(auth)))
    139     else:
    140         get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/api.py in get_engine_api()
     16     global _engine_api
     17     if not _engine_api:
---> 18         _engine_api = EngineAPI()
     19 
     20         from .._dataset_resolver import register_dataset_resolver

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/api.py in __init__(self)
     57 
     58         self._message_channel = launch_engine()
---> 59         connect_to_requests_channel()
     60 
     61         self._message_channel.on_relaunch(connect_to_requests_channel)

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/api.py in connect_to_requests_channel()
     53 
     54         def connect_to_requests_channel():
---> 55             self._engine_server_secret = self.sync_host_secret(self.requests_channel.host_secret)
     56             self._engine_server_port = self.sync_host_channel_port(self.requests_channel.port)
     57 

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/_aml_helper.py in wrapper(op_code, message, cancellation_token)
     36             if len(changed) > 0:
     37                 engine_api_func().update_environment_variable(changed)
---> 38             return send_message_func(op_code, message, cancellation_token)
     39 
     40         return wrapper

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/api.py in sync_host_secret(self, message_args, cancellation_token)
    253     @update_aml_env_vars(get_engine_api)
    254     def sync_host_secret(self, message_args: str, cancellation_token: CancellationToken = None) -> str:
--> 255         response = self._message_channel.send_message('Engine.SyncHostSecret', message_args, cancellation_token)
    256         return response
    257 

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/engine.py in send_message(self, op_code, message, cancellation_token)
    178 
    179             while True:
--> 180                 response = self._read_response()
    181                 if 'error' in response:
    182                     raise_engine_error(response['error'])

/azureml-envs/azureml/lib/python3.7/site-packages/azureml/dataprep/api/engineapi/engine.py in _read_response(self)
    140                 'Engine process terminated. Please try running again.'))
    141             log.error(repr(error))
--> 142             raise error
    143 
    144         parsed = None

MemoryError: Engine process terminated. This is most likely due to system running out of memory. Please retry with increased memory. |session_id=bcc0af7a-9f9e-4ad9-9c37-e97355b32810
MayMSFT commented 4 years ago

thanks for reporting the issue. We will investigate. In the mean time, which azureml-sdk version are you using? possible to upgrade to the latest sdk version and see whether the problem persists? thanks!

casparjespersen commented 4 years ago

thanks for reporting the issue. We will investigate. In the mean time, which azureml-sdk version are you using? possible to upgrade to the latest sdk version and see whether the problem persists? thanks!

azureml-automl-core                  1.13.0             
azureml-automl-runtime               1.13.0             
azureml-contrib-notebook             1.13.0             
azureml-core                         1.13.0             
azureml-dataprep                     2.0.7              
azureml-dataprep-native              20.0.2             
azureml-dataset-runtime              1.13.0             
azureml-defaults                     1.13.0             
azureml-explain-model                1.13.0             
azureml-interpret                    1.13.0             
azureml-model-management-sdk         1.0.1b6.post1      
azureml-pipeline                     1.13.0             
azureml-pipeline-core                1.13.0             
azureml-pipeline-steps               1.13.0             
azureml-sdk                          1.13.0             
azureml-telemetry                    1.13.0             
azureml-train                        1.13.0             
azureml-train-automl                 1.13.0             
azureml-train-automl-client          1.13.0.post1       
azureml-train-automl-runtime         1.13.0             
azureml-train-core                   1.13.0             
azureml-train-restclients-hyperdrive 1.13.0             
azureml-widgets                      1.13.0        
MayMSFT commented 4 years ago

Hi @casparjespersen WSL is not supported at the moment. You can refer to here to find out our supported distributions. The team will continue to investigate what cause the engine termination on WSL as a mid term work item. Thank you!

casparjespersen commented 4 years ago

@MayMSFT A few observations.

MayMSFT commented 4 years ago

Hi @casparjespersen

thanks for sharing the info. yes, we are investigating what caused the failure on WSL. We haven't identified root cause yet. Also, can you help share the Linux distribution you are using?

Thanks

casparjespersen commented 4 years ago

I am running a Devcontainer in VSCode using Docker for Desktop (WSL2) on Windows 10 (Ver. 2004). The Dockerfile is the following:

FROM mcr.microsoft.com/vscode/devcontainers/base:0-focal

ARG ANACONDA_VERSION=2020.02

ARG AZURE_ML_SDK_EXTRAS=notebooks,automl

# The javascript-node image includes a non-root node user with sudo access. Use 
# the "remoteUser" property in devcontainer.json to use it. On Linux, the container 
# user's GID/UIDs will be updated to match your local UID/GID when using the image
# or dockerFile property. Update USER_UID/USER_GID below if you are using the
# dockerComposeFile property or want the image itself to start with different ID
# values. See https://aka.ms/vscode-remote/containers/non-root-user for details.
ARG USERNAME=vscode
ARG USER_UID=1000
ARG USER_GID=$USER_UID

ARG CONDA_INSTALL_PATH=/opt/conda
ENV PATH=${CONDA_INSTALL_PATH}/bin:${PATH}
ENV LANG=C.UTF-8 LC_ALL=C.UTF-8

# Configure apt and install packages
RUN apt-get update \
    && export DEBIAN_FRONTEND=noninteractive \
    #
    # Alter vscode user as needed
    && if [ "$USER_GID" != "1000" ] || [ "$USER_UID" != "1000" ]; then \
        groupmod --gid $USER_GID $USERNAME \
        && usermod --uid $USER_UID --gid $USER_GID $USERNAME \
        && chown -R $USER_UID:$USER_GID /home/$USERNAME; \
    fi \
    #
    # Install Docker CLI
    && apt-get install -y gnupg-agent software-properties-common \
    && curl -fsSL https://download.docker.com/linux/$(lsb_release -is | tr '[:upper:]' '[:lower:]')/gpg | (OUT=$(apt-key add - 2>&1) || echo $OUT) \
    && add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/$(lsb_release -is | tr '[:upper:]' '[:lower:]') $(lsb_release -cs) stable" \
    && apt-get update \
    && apt-get install -y docker-ce-cli \
    #
    # Set up Anaconda - adapted for Ubuntu from https://github.com/ContinuumIO/docker-images/blob/master/anaconda3/debian/Dockerfile
    # Use vscode user for the installation so that it can be used to manage the conda environment.
    && apt-get install -y bzip2 libglib2.0-0 libxext6 libsm6 libxrender1 gcc g++ \
    && mkdir -p ${CONDA_INSTALL_PATH} \
    && chown ${USERNAME}:root /opt/conda \
    && echo "Downloading Anaconda..." \
    && su --login -c "wget -q https://repo.anaconda.com/archive/Anaconda3-${ANACONDA_VERSION}-Linux-x86_64.sh -O /tmp/anaconda-install.sh \
        && /bin/bash /tmp/anaconda-install.sh -u -b -p ${CONDA_INSTALL_PATH}" ${USERNAME} 2>&1  \
    && rm /tmp/anaconda-install.sh \
    && ln -s ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh /etc/profile.d/conda.sh \
    # Add conda init to .bashrc/.zshrc, tweak ownership if UID was changed
    && export SNIPPET="export PATH=\$PATH:\$HOME/.local/bin \
        && if [ \"\$(stat -c '%U' ${CONDA_INSTALL_PATH})\" != \"${USERNAME}\" ]; then \
            sudo chown -R ${USERNAME}:root ${CONDA_INSTALL_PATH}; \
        fi \
        && . ${CONDA_INSTALL_PATH}/etc/profile.d/conda.sh \
        && conda activate base" \
    && echo "$SNIPPET" | tee -a /root/.bashrc >> /home/${USERNAME}/.bashrc \
    && echo "$SNIPPET" | tee -a /root/.zshrc >> /home/${USERNAME}/.zshrc \
    && find ${CONDA_INSTALL_PATH}/ -follow -type f -name '*.a' -delete \
    && find ${CONDA_INSTALL_PATH}/ -follow -type f -name '*.js.map' -delete \
    && ${CONDA_INSTALL_PATH}/bin/conda clean -afy \
    #
    # Install and Azure ML SDK as vscode user so it can be updated by both users
    && su --login -c "${CONDA_INSTALL_PATH}/bin/pip install --no-cache-dir --upgrade azureml-sdk[${AZURE_ML_SDK_EXTRAS}]" ${USERNAME} 2>&1 \
    #
    # Clean up
    && apt-get autoremove -y \
    && apt-get clean -y \
    && rm -rf /var/lib/apt/lists/*

# Install BLAS/LAPACK
RUN apt-get update && \
    apt-get install -y libopenblas-dev liblapack-dev && \
    apt-get autoremove -y && apt-get clean -y && rm -rf /var/lib/apt/lists/* 

# Update the conda environment
COPY environment.yml /azureml-envs/mutated_conda_dependencies.yml
RUN conda env create -p /azureml-envs/azureml -f /azureml-envs/mutated_conda_dependencies.yml

# Install pip packages
ARG PIP_EXTRA_INDEX_URL
ENV PIP_EXTRA_INDEX_URL ${PIP_EXTRA_INDEX_URL}
COPY requirements.txt /azureml-envs/requirements.txt
RUN /bin/bash -c "source activate /azureml-envs/azureml && pip install ipykernel && pip install -r /azureml-envs/requirements.txt"

And the environment.yml used to create the conda environment is (in the 2nd-final block of the Dockerfile):

name: base

dependencies:
- python=3.7.9
- pip
- pip:
  - azureml-sdk~=1.13.0
  - azureml-defaults

channels:
- anaconda
- conda-forge
MayMSFT commented 4 years ago

Hi! Thanks for sharing. WSL2 defaulting to Ubuntu 20, which is not supported at the moment. You can follow the following steps to unblock:

MayMSFT commented 4 years ago

please-close

pjaselin commented 3 years ago

I'm having this same issue but with the Tabular.Dataset.from_delimited_files command. I have a Python-based web app deployed in a Docker container and I use this command to register a csv in a datastore as an AML dataset, but I get the error:

MemoryError: Engine process terminated. This is most likely due to the system running out of memory. Please retry with increased memory.

Inspecting the available memory, I highly doubt the system memory is an issue and the complaint only occurs at this line. Is there a directory I need to give the web app permission to access? I tried the above fixes and I haven't seen a difference in the error. Thanks in advance!

Raemi commented 8 months ago

For anyone still stumbling over this issue due to an old dependency on azureml-dataprep that relies on dotnetcore2. These MemoryErrors just mean that the "engine process" crashed/aborted for any reason. After analyzing my problem, I found that it requires libssl1 (some message was printed [and swallowed by azureml] regarding "No suitable libssl found").

Here is a workaround for Ubuntu 22.04: https://stackoverflow.com/questions/72108697/when-i-open-unity-and-make-something-project-then-the-error-is-coming-that-no

Relevant issue in dotnetcore2: https://github.com/dotnet/core/issues/4749