Azure / MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
https://docs.microsoft.com/azure/machine-learning/service/
MIT License
4.09k stars 2.52k forks source link

LocalWebservice exposing wrong port #1573

Open thomasfrederikhoeck opened 3 years ago

thomasfrederikhoeck commented 3 years ago

When I try to deploy a dummy with LocalWebservice it exposes the wrong port making failing the deploy which causes it to hang and be unavalible. I run the code:

from azureml.core import Workspace, Environment
from azureml.core.model import Model
from azureml.core.webservice import LocalWebservice
from azureml.core.model import InferenceConfig

ws = Workspace(
        subscription_id=subscription_id,
        resource_group=resource_group,
        workspace_name=workspace_name,
        auth=auth,
    )
deployment_config = LocalWebservice.deploy_configuration(6584)
inference_config = InferenceConfig(environment=env,entry_script="./echo_score.py")
service = Model.deploy(
    ws,
    "myservice",
    [],
    inference_config,
    deployment_config,
    overwrite=True,
)

service.wait_for_deployment(show_output=True)
%% echo_score.py
import json

def init():
    print("This is init")

def run(data):
    test = json.loads(data)
    print(f"received data {test}")
    return f"test is {test}"

It yields the following output where it hang at the last step:

Generating Docker build context.
2021/08/13 13:41:37 Downloading source code...
2021/08/13 13:41:38 Finished downloading source code
2021/08/13 13:41:39 Creating Docker network: acb_default_network, driver: 'bridge'
2021/08/13 13:41:39 Successfully set up Docker network: acb_default_network
2021/08/13 13:41:39 Setting up Docker configuration...
2021/08/13 13:41:40 Successfully set up Docker configuration
2021/08/13 13:41:40 Logging in to registry: thecloudacr.azurecr.io
2021/08/13 13:41:41 Successfully logged into thecloudacr.azurecr.io
2021/08/13 13:41:41 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Network: 'acb_default_network'
2021/08/13 13:41:41 Scanning for dependencies...
2021/08/13 13:41:42 Successfully scanned dependencies
2021/08/13 13:41:42 Launching container with name: acb_step_0

Step 1/18 : FROM mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cpu-inference:20210809.v1@sha256:dacd678dcb61ebdecd5fbb6a481d3ccb3ffba995e9626cbd14c3ed056ee73efa
mcr.microsoft.com/azureml/minimal-ubuntu18.04-py37-cpu-inference:20210809.v1@sha256:dacd678dcb61ebdecd5fbb6a481d3ccb3ffba995e9626cbd14c3ed056ee73efa: Pulling from azureml/minimal-ubuntu18.04-py37-cpu-inference
feac53061382: Pulling fs layer
246bf51163f6: Pulling fs layer
8ebaf78910cc: Pulling fs layer
8fcc981d1355: Pulling fs layer
53c3b28379c0: Pulling fs layer
98b698ee5229: Pulling fs layer
d3a83176dae8: Pulling fs layer
4d4edd5a312a: Pulling fs layer
74eadab87c4c: Pulling fs layer
show more (open the raw output data in a text editor) ...

Container has been successfully cleaned up.
Image sha256:1fc6b46f745823c0cf07defba8bef619a82f7eeba0f133e2ecbaa92d969fe037 successfully removed.
Starting Docker container...
Docker container running.
Checking container health...

When i run docker ps i get that port 6584 maps to 5001

CONTAINER ID        IMAGE               COMMAND                 CREATED             STATUS              PORTS                                                 NAMES
5d61a0df3ee4        myservice           "runsvdir /var/runit"   4 minutes ago       Up 4 minutes        127.0.0.1:6584->5001/tcp, 127.0.0.1:32775->8883/tcp   intelligent_wing

but if I get the logs with docker logs 5d61a0df3ee4 Iget the following showing that it should be port 31311 that should be exposed:

2021-08-13T13:48:30,287489720+00:00 - gunicorn/run
2021-08-13T13:48:30,287524800+00:00 - nginx/run
2021-08-13T13:48:30,296273498+00:00 - rsyslog/run
The entry script directory is /var/azureml-app/.
Dynamic Python package installation is disabled.
Starting AzureML Inference Server HTTP.

Azure ML Inferencing HTTP server v0.3.0

Server Settings
---------------
Entry Script Name: echo_score.py
Model Directory: azureml-models/
Worker Count: 1
Server Port: 31311
Application Insights Enabled: false
Application Insights Key: None

Server Routes
---------------
Liveness Probe: GET   127.0.0.1:31311/
Score:          POST  127.0.0.1:31311/score

Starting gunicorn 20.1.0
Listening at: http://0.0.0.0:31311 (11)
Using worker: sync
Booting worker with pid: 23
Initializing logger
2021-08-13 13:48:31,318 | root | INFO | Starting up app insights client
logging socket was found. logging is available.
logging socket was found. logging is available.
2021-08-13 13:48:31,318 | root | INFO | Starting up request id generator
2021-08-13 13:48:31,318 | root | INFO | Starting up app insight hooks
2021-08-13 13:48:31,318 | root | INFO | Invoking user's init function
This is init
no request id,This is init

2021-08-13 13:48:31,319 | root | INFO | Users's init has completed successfully
2021-08-13 13:48:31,321 | root | INFO | Skipping middleware: dbg_model_info as it's not enabled.
2021-08-13 13:48:31,321 | root | INFO | Skipping middleware: dbg_resource_usage as it's not enabled.
2021-08-13 13:48:31,323 | root | INFO | Scoring timeout setting is not found. Use default timeout: 3600000 ms

If I exec into the container an curl on 127.0.0.1:31311/ I do get a prober health check.

I'm using azureml-core==1.33.0 on Windows in python=3.8.8

keriehm commented 3 years ago

5001 is the right port for it to be exposing. The nginx instance inside the container serves as a proxy from port 5001 to the internal gunicorn instance that listens on port 31311.

Other things you can check: Can you curl to port 5001 from inside and outside the container? How about from inside the notebook, if you do something like import requests; requests.get('http://localhost:5001')?

thomasfrederikhoeck commented 3 years ago

Thank you for the answer @keriehm . I'm not running inside a AzureML Notebook but on my local machine where I use docker with SSH (DOCKER_HOST configured).

I was able to curl port 5001 from inside the container. When I connect with SSH to the host machine I was able to curl port 6584(specified in deployment_config). So it I realised it was something to with the which adresses it listens for. To compare i also started a standard nginx with docker run -p 81:80 -d nginx. I then use netstat on the host machine:

tfh@srvdocker01:~$ sudo netstat -tulpn | grep LISTEN
tcp        0      0 127.0.0.1:6584          0.0.0.0:*               LISTEN      10067/docker-proxy
tcp        0      0 127.0.0.1:32770         0.0.0.0:*               LISTEN      10054/docker-proxy
tcp        0      0 127.0.0.53:53           0.0.0.0:*               LISTEN      670/systemd-resolve
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      725/sshd
tcp6       0      0 :::81                   :::*                    LISTEN      9231/docker-proxy
tcp6       0      0 :::22                   :::*                    LISTEN      725/sshd

which showes that the deployed service only listing on the loopback address (127.0.0.1) while a normal deployment listens on all addresses. Is there some way to add to LocalWebservice.deploy_configuration() so that you can listen on all addresses?