Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.07k stars 2.52k forks source link

Failure to load dependency lib for azureml-monitoring-0.1.0a21 #1741

Open mcompen opened 2 years ago

mcompen commented 2 years ago

I am trying to use azureml-monitoring for collecting logs to blob-storage for AzureML AKS Model deployments. I am following this page to set it up. However, upon calling the model, I get the following logs from the service pod:

File "/structure/azureml-app/score.py", line 23, in run
inputs_dc.collect(data) #this call is saving our input data into Azure Blob
File "/azureml-envs/sklearn-1.0/lib/python3.8/site-packages/azureml/monitoring/modeldatacollector.py", line 457, in collect
if (not self._cloud_enabled or self._handle == -1) and not self._debug:
AttributeError: 'ModelDataCollector' object has no attribute '_cloud_enabled'

Further down the stacktrace, it shows this.

00000000-0000-0000-0000-000000000000,Invalid or corrupted package: Unable to load dependency library!

It seems like this class is broken.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Trewitz commented 2 years ago

Any update on this? I'm getting the same error in a production environment

moeenkhurram commented 2 years ago

I am having same issue , so far what I have learned is azureml.monitoring works only in python version 3.6 , 3.5 and 2.7,

[https://github.com/Azure/MachineLearningNotebooks/issues/1806]

also in https://pypi.org/project/azureml-monitoring/, it says

This package has been tested with Python 2.7 and 3.5.

jnesfield commented 2 years ago

So we did some digging and the problem is related to the way the python packages deploy models to aks. It appears the problem is not in python or sdk versions but is in the base image used to build the container. we traced this back to libssl not being in baseImages available. Thus....

first off look at currently deployed models/tools that are stable and working:

using the following code:

import os
import sys
from azureml.core import Workspace
from azureml.core.webservice import AciWebservice, Webservice, AksWebservice
from azureml.core.webservice.aci import VnetConfiguration
from azureml.core.model import Model
from azureml.exceptions import WebserviceException
from azureml.core.compute import AksCompute, ComputeTarget

from azureml.core.authentication import ServicePrincipalAuthentication

svc_pr_password = dbutils.secrets.get(scope = "", key = "")
# source_uri = sys.argv[2]

tenant_id = ""
service_principal_id = ""

svc_pr = ServicePrincipalAuthentication(
    tenant_id=tenant_id,
    service_principal_id=service_principal_id,
    service_principal_password=svc_pr_password,
)

workspace_name = ""
subscription_id = ""
resource_group = ""

ws = Workspace.get(name=workspace_name,
                   subscription_id=subscription_id,
                   resource_group=resource_group,
                   auth=svc_pr,
                   )

from azureml.core import Environment

string = Environment.list(ws)

print(type(string))

print(string)

from that you will find the 'docker:' section, there you will find baseImage.

we have found that our last stable deployments all used the image:

"baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20220314.v1",

To add this to your deployment script you have to declare the base image as a string

base_image = r"mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20220314.v1"

then add it to your env definition:

conda_dep = CondaDependencies(conda_dependencies_file_path=env_file_path)
conda_dep.add_pip_package("azureml-defaults")
conda_dep.add_pip_package("azureml-contrib-services")

myenv.python.conda_dependencies = conda_dep
myenv.docker.base_image = base_image
myenv.register(workspace=ws)
inference_config = InferenceConfig(entry_script=entry_script_file, environment=myenv)

as long as you have no other dependency issues it will work.

Here is the conde dependencies in our yaml for example:

channels:
  - defaults
dependencies:
  - pandas==1.2.4
  - pip=20.2.4
  - python=3.8.10
  - numpy==1.19.2
  - werkzeug=1.0.1
  - pip:
    - azure-common==1.1.27
    - azure-core==1.11.0
    - azure-graphrbac==0.61.1
    - azure-mgmt-authorization==0.61.0
    - azure-mgmt-containerregistry==8.0.0
    - azure-mgmt-core==1.2.2
    - azure-mgmt-keyvault==2.2.0
    - azure-mgmt-resource==13.0.0
    - azure-mgmt-storage==11.2.0
    - azure-storage-blob==12.7.1
    - azureml==0.2.7
    - azureml-core==1.29.0
    - azureml-monitoring
    - prometheus-client
    - tensorboard==2.8.0
    - tensorboard-data-server==0.6.1
    - tensorboard-plugin-wit==1.8.1
    - tensorflow==2.7.0
    - tensorflow-estimator==2.7.0
    - tensorflow-io-gcs-filesystem==0.24.0
    - Pillow==8.2.0
    - SciPy==1.6.2
    - opencv-python==4.2.0.34
vizhur commented 1 year ago

I don't think the problem is related to the base image. Can you please confirm that you can build a new derived image based on 20220314.v1 that actually works? From the environment itself, you are using azureml-monitoring that was never released further than a candidate https://pypi.org/project/azureml-monitoring/#history. Last release happened almost 3 years ago for 0.1.0a21 version. Upper python version is specified to py36, though your environment is py38, so it is hard to expect things even to be able to install successfully. Unless you use some resolver that would ignore all the conflicts and let incompatible dependencies get installed, like pip<20.3