MicrosoftLearning / mslearn-azure-ml

https://microsoftlearning.github.io/mslearn-azure-ml/
MIT License
193 stars 218 forks source link

Lab 11: ResourceNotReady error #100

Closed majid75 closed 1 month ago

majid75 commented 2 months ago

Module: Deploy a model to a managed online endpoint

Lab/Demo: 11: Deploy an MLflow model to an online endpoint

Task: 05 Create the deployment

Step: 01 Running the cell "ml_client.online_deployments.begin_create_or_update(blue_deployment).result()"

Description of issue Running this cell "ml_client.online_deployments.begin_create_or_update(blue_deployment).result()", I got the following error: " OperationFailed Traceback (most recent call last) File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/base_polling.py:757, in LROBasePolling.run(self) 756 try: --> 757 self._poll() 759 except BadStatus as err:

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/base_polling.py:789, in LROBasePolling._poll(self) 788 if _failed(self.status()): --> 789 raise OperationFailed("Operation failed or canceled") 791 final_get_url = self._operation.get_final_get_url(self._pipeline_response)

OperationFailed: Operation failed or canceled

The above exception was the direct cause of the following exception:

HttpResponseError Traceback (most recent call last) Cell In[19], line 1 ----> 1 ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/_poller.py:251, in LROPoller.result(self, timeout) 242 def result(self, timeout: Optional[float] = None) -> PollingReturnType_co: 243 """Return the result of the long running operation, or 244 the result available after the specified timeout. 245 (...) 249 :raises ~azure.core.exceptions.HttpResponseError: Server problem with the query. 250 """ --> 251 self.wait(timeout) 252 return self._polling_method.resource()

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/tracing/decorator.py:78, in distributed_trace..decorator..wrapper_use_tracer(*args, *kwargs) 76 span_impl_type = settings.tracing_implementation() 77 if span_impl_type is None: ---> 78 return func(args, **kwargs) 80 # Merge span is parameter is set, but only if no explicit parent are passed 81 if merge_span and not passed_in_parent:

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/_poller.py:270, in LROPoller.wait(self, timeout) 266 self._thread.join(timeout=timeout) 267 try: 268 # Let's handle possible None in forgiveness here 269 # https://github.com/python/mypy/issues/8165 --> 270 raise self._exception # type: ignore 271 except TypeError: # Was None 272 pass

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/_poller.py:185, in LROPoller._start(self) 181 """Start the long running operation. 182 On completion, runs any callbacks. 183 """ 184 try: --> 185 self._polling_method.run() 186 except AzureError as error: 187 if not error.continuation_token:

File /anaconda/envs/azureml_py38/lib/python3.9/site-packages/azure/core/polling/base_polling.py:772, in LROBasePolling.run(self) 765 raise HttpResponseError( 766 response=self._pipeline_response.http_response, 767 message=str(err), 768 error=err, 769 ) from err 771 except OperationFailed as err: --> 772 raise HttpResponseError(response=self._pipeline_response.http_response, error=err) from err

HttpResponseError: (ResourceNotReady) User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready Code: ResourceNotReady Message: User container has crashed or terminated: Liveness probe failed: HTTP probe failed with statuscode: 502. Please see troubleshooting guide, available here: https://aka.ms/oe-tsg#error-resourcenotready " The error log of the endpoint shows: " Instance status: SystemSetup: Succeeded UserContainerImagePull: Succeeded ModelDownload: Succeeded UserContainerStart: Succeeded

Container events: Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:30.13373Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:37.264146Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:40.130892Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:47.266695Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:50.133494Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:37:57.273778Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:00.133587Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:07.26298Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:10.13004Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:17.271268Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:20.133495Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:27.278977Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:30.129938Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:37.267241Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:40.133582Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:47.266312Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:50.079849Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: ReadinessProbeFailed, Type: Warning, Time: 2024-09-18T04:38:57.269325Z, Message: Readiness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: LivenessProbeFailed, Type: Warning, Time: 2024-09-18T04:39:00.130378Z, Message: Liveness probe failed: HTTP probe failed with statuscode: 502 Kind: Pod, Name: Killing, Type: Normal, Time: 2024-09-18T04:39:00.14249Z, Message: Stopping container inference-server

Container logs: 2024-09-18T04:33:53,098708367+00:00 - rsyslog/run 2024-09-18T04:33:53,138811064+00:00 - gunicorn/run 2024-09-18T04:33:53,144150523+00:00 - nginx/run 2024-09-18T04:33:53,148932587+00:00 | gunicorn/run | 2024-09-18T04:33:53,152644359+00:00 | gunicorn/run | ############################################### 2024-09-18T04:33:53,165077165+00:00 | gunicorn/run | AzureML Container Runtime Information 2024-09-18T04:33:53,174006598+00:00 | gunicorn/run | ############################################### 2024-09-18T04:33:53,231293864+00:00 | gunicorn/run | nginx: [warn] the "user" directive makes sense only if the master process runs with super-user privileges, ignored in /etc/nginx/nginx.conf:1 2024-09-18T04:33:53,253206299+00:00 | gunicorn/run | 2024-09-18T04:33:53,270932165+00:00 | gunicorn/run | AzureML image information: mlflow-ubuntu20.04-py38-cpu-inference:20240805.v5 2024-09-18T04:33:53,274022041+00:00 | gunicorn/run | 2024-09-18T04:33:53,278574607+00:00 | gunicorn/run | 2024-09-18T04:33:53,281480185+00:00 | gunicorn/run | PATH environment variable: /opt/miniconda/envs/amlenv/bin:/opt/miniconda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2024-09-18T04:33:53,286665146+00:00 | gunicorn/run | PYTHONPATH environment variable: 2024-09-18T04:33:53,289662823+00:00 | gunicorn/run | 2024-09-18T04:33:59,359224849+00:00 | gunicorn/run | CONDAPATH environment variable: /opt/miniconda

conda environments:

# base /opt/miniconda amlenv /opt/miniconda/envs/amlenv

2024-09-18T04:34:02,334232432+00:00 | gunicorn/run | 2024-09-18T04:34:02,340537887+00:00 | gunicorn/run | Pip Dependencies (before dynamic installation)

annotated-types==0.7.0 azure-core==1.30.2 azure-identity==1.17.1 azureml-inference-server-http==1.3.0 blinker==1.8.2 cachetools==5.4.0 certifi==2024.7.4 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 cryptography==43.0.0 Flask==2.3.2 Flask-Cors==3.0.10 google-api-core==2.19.1 google-auth==2.32.0 googleapis-common-protos==1.63.2 gunicorn==22.0.0 idna==3.7 importlib_metadata==8.2.0 inference-schema==1.8 itsdangerous==2.2.0 Jinja2==3.1.4 MarkupSafe==2.1.5 msal==1.30.0 msal-extensions==1.2.0 opencensus==0.11.4 opencensus-context==0.1.3 opencensus-ext-azure==1.1.13 packaging==24.1 portalocker==2.10.1 proto-plus==1.24.0 protobuf==5.27.3 psutil==6.0.0 pyasn1==0.6.0 pyasn1_modules==0.4.0 pycparser==2.22 pydantic==2.7.4 pydantic-settings==2.4.0 pydantic_core==2.18.4 PyJWT==2.9.0 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.1 requests==2.32.3 rsa==4.9 six==1.16.0 typing_extensions==4.12.2 urllib3==2.2.2 Werkzeug==3.0.3 wrapt==1.16.0 zipp==3.19.2

2024-09-18T04:34:08,239187463+00:00 | gunicorn/run | 2024-09-18T04:34:08,243970230+00:00 | gunicorn/run | Entry script directory: /var/mlflow_resources/. 2024-09-18T04:34:08,247394007+00:00 | gunicorn/run | 2024-09-18T04:34:08,251678578+00:00 | gunicorn/run | ############################################### 2024-09-18T04:34:08,255065055+00:00 | gunicorn/run | Dynamic Python Package Installation 2024-09-18T04:34:08,258553631+00:00 | gunicorn/run | ############################################### 2024-09-18T04:34:08,261841809+00:00 | gunicorn/run | 2024-09-18T04:34:08,265344385+00:00 | gunicorn/run | Updating conda environment from /var/azureml-app/azureml-models/414391c12d465d8fabd8e8afa6a788fd/1/model/conda.yaml ! Retrieving notices: ...working... done Channels:

Downloading and Extracting Packages: ...working... done Preparing transaction: ...working... done Verifying transaction: ...working... done Executing transaction: ...working... done Installing pip dependencies: ...working... '

afelix-95 commented 2 months ago

@majid75 did you go through this lab in a VM environment or with your own subscription? I couldn't repro your error so it may have been a temporary issue.

afelix-95 commented 1 month ago

Closing it as non-repro.