Allow user to customize readiness probe

bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

https://bentoml.com

Apache License 2.0

7.13k stars 791 forks source link

Allow user to customize readiness probe #2630

Closed jiewpeng closed 2 years ago

jiewpeng commented 2 years ago

Is your feature request related to a problem? Please describe. Currently, the readiness probe does not bother to check anything; it just returns 200 OK if the app is started up. However, in case the developer accidentally introduces a bug into the bento/service file when modifying it, the deployment would be marked as ready, when it should not be. This makes a bugged deployment replace a previously working one, resulting in downstream failures.

Describe the solution you'd like Allow the user to customize the readiness probe / readyz behaviour with a custom function, for instance to call the model with a known valid input, and assert that the model returns a valid output, before marking the deployment as ready. This would also allow developers to assert that connections to external resources such as a feature store are working correctly, before marking a deployment as ready to accept connections.

Describe alternatives you've considered The developer can create a new route that is not called /readyz (since it is reserved) to perform these checks and then modify the Kubernetes deployment to use this route as the readiness probe. Not sure whether this is compatible with Yatai since I do not use it.

Additional context None

yubozhao commented 2 years ago

@jiewpeng That's a great suggestion. We are planning on introducing more capability that helps users to have the confidence when the server started.

I think provide custom function for readiness is a good way to go. Let me check with the team and we will update discussion on this issue

yubozhao commented 2 years ago

@jiewpeng Are you doing any model validation during training and before push into production? I would love to learn more about your context and situation

jiewpeng commented 2 years ago

We do have validation for some of our models before deploying them, however this is not able to fully simulate the actual BentoML environment since we directly call the model instance (e.g. sklearn model, pytorch model) with various inputs when testing, so

This cannot account for how BentoML sends the input to the model through runner.run or runner.run_batch, and how the output of these functions get sliced
This cannot account for the developer forgetting to package a required library to the Bento, or forgetting to import it in the Bento / service file (we have some code to generate the service file rather than writing it directly because our development workflow is notebook-centric and as a result, it is quite awkward to write the bento / service file and build the bento manually)

ssheng commented 2 years ago

Hi @jiewpeng, maybe a simpler solution is to add an ASGI middleware to intercept the /readyz request and inject your custom validation logic.

In your service definition module, you can add an ASGI middleware by calling svc.add_asgi_middleware().

from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import PlainTextResponse

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

class CustomReadyzMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        if request.url == "http://127.0.0.1:3000/readyz":
            ...
            return PlainTextResponse("Not ready", status_code=503)
        return await call_next(request)

svc.add_asgi_middleware(CustomReadyzMiddleware)

jiewpeng commented 2 years ago

@ssheng thanks for the suggestion; unfortunately this method did not allow me to use the runner nor the logger - or am I doing something wrong? This is my service.py file for the iris clf example in the bentoml tutorial. Calling /readyz, I do not see any logs at all, but if I print the exceptions e.g. by using traceback.print_exc(), I can see in the logs RuntimeError: This function can only be run from an AnyIO worker thread when it tries to use the runner,

import bentoml
from bentoml.io import NumpyNdarray
import logging
import numpy as np
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

class CustomReadyzMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        logging.info(f"Requested url: {request.url.path}")
        if request.url.path.endswith("/readyz"):
            try:
                # deliberately wrong input just to force an error
                data = [[0.1, 0.1, 0.1]]
                iris_clf_runner.run(data)
            except Exception as e:
                return JSONResponse(
                    {"status": "Not OK", "detail": str(e)}, status_code=503
                )
            return JSONResponse({"status": "OK"}, status_code=200)
        return await call_next(request)

svc.add_asgi_middleware(CustomReadyzMiddleware)

However, this service file works, though it does it by mounting a fastapi app and exposing the readyz on another route - not sure what are the performance implications of this though. Also in this method, the api server log is disconnected from the trace ID for the custom readyz route.

import bentoml
from bentoml.io import NumpyNdarray
from fastapi import FastAPI, HTTPException, status
import logging
import numpy as np

fastapi_app = FastAPI()

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

@fastapi_app.get("/custom-readyz", status_code=status.HTTP_200_OK)
def ready():
    # deliberately wrong input just to force an error
    data = [[0.1, 0.1, 0.1]]
    try:
        iris_clf_runner.run(data)
    except Exception as e:
        logging.exception(e)
        raise HTTPException(status_code=503, detail=str(e))
    return {"status": "OK"}

svc.mount_asgi_app(fastapi_app)

yubozhao commented 2 years ago

@jiewpeng To build on top of Sean's suggestion, I think you can use the middleware to redirect the request to /readyz to a service endpoint that does the validation, that way you will have access to the runners.

from starlet the.requests import Request
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

@svc.api(input=JSON(), output=JSON())
def myreadyz(input):
    try:
        Data = [[0.1, 0.1, 0.1]]
            iris_clf_runner.run(data)
    except Exception as e:
            return JSONResponse(
                    {"status": "Not OK", "detail": str(e)}, status_code=503
                )
    return JSONResponse({"status": "OK"}, status_code=200)

class CustomReadyzMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        req = Request(scope, receive)
        # redirecting request to /readyz to your custom /myreadyz
        if req.url.path == '/readyz':
           scope['method'] = 'POST'
           scope['path'] = '/myreadyz'
        await self.app(scope, receive, send)

svc.add_asgi_middleware(CustomReadyzMiddleware)

jiewpeng commented 2 years ago

Hi @yubozhao , thanks for the suggestion, but this method is not working.

From the logs, I can see that because the myreadyz (which I call readyz_predict) expects a JSON input but does not receive one, so it fails.

2022-07-04T01:29:06+0000 [ERROR] [api_server:1] Exception on /readyz_predict [POST] (trace=81251819796244078624966248506590491305,span=1586327102725891898,sampled=0)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 287, in api_func
    input_data = await api.input.from_http_request(request)
  File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 184, in from_http_request
    raise BadInput(f"Json validation error: {e}") from None
bentoml.exceptions.BadInput: Json validation error: Expecting value: line 1 column 1 (char 0)

Additionally, when I SSH'd into the pod and called requests.post("http://localhost:3000/readyz_predict", json="1") to simulate sending a dummy JSON input, I can avoid this error, but then I get to the next error, which is that the bentoml service does not know how to return the JSONResponse, as it tries to use json.dumps() on it, which doesn't work.

2022-07-04T01:26:48+0000 [ERROR] [api_server:1] Exception on /readyz_predict [POST] (trace=229707093816526215582168499431547298555,span=6784491415553690799,sampled=0)
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 314, in api_func
    response = await api.output.to_http_response(output, ctx)
  File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 187, in to_http_response
    json_str = json.dumps(
  File "/usr/local/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 57, in default
    return super().default(o)
  File "/usr/local/lib/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type JSONResponse is not JSON serializable

yubozhao commented 2 years ago

@jiewpeng sorry about my hasty response. You will need to pass an empty json payload as part of the redirect and also instead of using JSON for output, you can use Text and return an empty string for successful validation

jiewpeng commented 2 years ago

With the following service file, it still does not work:

import bentoml
from bentoml.io import JSON, NumpyNdarray, Text
import logging
import numpy as np
from starlette.requests import Request

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

@svc.api(input=JSON(), output=Text())
def myreadyz(input):
    data = [[0.1, 0.1, 0.1]]
    iris_clf_runner.predict.run(data)
    return ""

class CustomReadyzMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        req = Request(scope, receive)
        # redirecting request to /readyz to your custom /myreadyz
        if req.url.path == "/readyz":
            scope["method"] = "POST"
            scope["path"] = "/myreadyz"

        message = {
            "type": "http.request",
            "body": "1".encode("utf-8"),
            "more_body": False,
        }

        async def create_message():
            return message

        await self.app(scope, create_message, send)

svc.add_asgi_middleware(CustomReadyzMiddleware)

When I try calling /readyz, it says the runner is not initialized.

2022-07-04T03:53:46+0000 [ERROR] [api_server:7] Exception on /myreadyz [POST] (trace=300799302259605274778301312933085049161,span=16463248736953569975,sampled=0)
Traceback (most recent call last):
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 312, in api_func
    output = await run_in_threadpool(api.func, input_data)
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/home/vscode/bentoml/bentos/iris_classifier/4kypzvh3jswgaasc/src/service.py", line 42, in myreadyz
    iris_clf_runner.predict.run(data)
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 40, in run
    return self.runner._runner_handle.run_method(  # type: ignore
  File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/__init__.py", line 58, in run_method
    raise StateException("Runner is not initialized")
bentoml.exceptions.StateException: Runner is not initialized
2022-07-04T03:53:46+0000 [INFO] [api_server:7] 127.0.0.1:49430 (scheme=http,method=POST,path=/myreadyz,type=,length=) (status=500,type=application/json,length=110) 0.006ms (trace=300799302259605274778301312933085049161,span=16463248736953569975,sampled=0)

In addition, even if it ran without issues, it would still be not possible to return something meaningful e.g. the error itself (though this is secondary)

aarnphm commented 2 years ago

can you make sure to update bentoml to latest rc3 release?

jiewpeng commented 2 years ago

Yup I upgraded to the rc3 release yesterday

ssheng commented 2 years ago

@jiewpeng I got a different error than the one you posted. However, I was able to get the following example working. We can setup an office hours with you if you continue to experience issues. Please let us know.

import bentoml
from bentoml.io import JSON, NumpyNdarray, Text
import numpy as np

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

@svc.api(input=JSON(), output=Text())
def myreadyz(input):
    data = [[0.1,0.1,0.1,0.1]]
    result = iris_clf_runner.predict.run(data)
    ...
    return ""

class CustomReadyzMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):

        async def _send(message):
            if message["type"] == "http.response.start":
                message = {
                    "type": "http.response.start",
                    "status": 503,
                    "headers": [
                        [b"content-type", b"text/plain"],
                        [b"content-length", b"3"],
                    ],
                }
            elif message["type"] == "http.response.body":
                if "more_body" in message and message["more_body"]:
                    return
                else:
                    message = {
                        "type": "http.response.body",
                        "body": b"BAD",
                        "more_body": False,
                    }
            await send(message)

        async def _receive():
            await receive()
            message = {
                "type": "http.request",
                "body": b"[[5, 4, 3, 2]]",
                "more_body": False,
            }
            return message

        if "path" in scope and scope["path"] == "/readyz":
            scope["path"] = "/myreadyz"
            scope["method"] = "POST"
            return await self.app(scope, _receive, _send)

        return await self.app(scope, receive, send)

svc.add_asgi_middleware(CustomReadyzMiddleware)

Proof

✗ curl -X GET -v http://127.0.0.1:3000/readyz
Note: Unnecessary use of -X or --request, GET is already inferred.
*   Trying 127.0.0.1:3000...
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /readyz HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.77.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< date: Tue, 05 Jul 2022 09:24:11 GMT
< server: uvicorn
< content-type: text/plain
< content-length: 3
< 
* Connection #0 to host 127.0.0.1 left intact
BAD%

jiewpeng commented 2 years ago

@ssheng thanks, your code works, however I don't really see the point of the if message["type"] == "http.response.start" chunk...perhaps I don't understand bentoml's internals well enough. Your code also defeats the purpose of the readiness probe logic, because no matter what happens inside the function, the middleware will just return HTTP 503 "BAD".

I have modified your snippet slightly to look like this. In this case, the readiness probe can return something useful, and we can return the correct status code based on whether or not an exception has occurred within the myreadyz function.

import bentoml
from bentoml.io import NumpyNdarray, JSON, Text
import json
import logging
import numpy as np
import traceback

iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()

svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
    result = iris_clf_runner.predict.run(input_series)
    return result

@svc.api(input=JSON(), output=Text())
def myreadyz(input):
    try:
        data = [[0.1, 0.1, 0.1, 0.1]]
        iris_clf_runner.predict.run(data)
        return json.dumps({"status": "OK"})
    except Exception as e:
        logging.exception(e)
        return json.dumps(
            {
                "status": "Not OK",
                "error": repr(e),
                "traceback": traceback.format_exc(),
            }
        )

class CustomReadyzMiddleware:
    def __init__(self, app):
        self.app = app

    async def __call__(self, scope, receive, send):
        async def _send(message):
            if message["type"] == "http.response.body":
                message["status"] = 200 if b'"status": "OK"' in message["body"] else 503
            await send(message)

        async def _receive():
            await receive()
            message = {
                "type": "http.request",
                "body": b'{"dummy": "input"}',
            }
            return message

        if "path" in scope and scope["path"] == "/readyz":
            scope["path"] = "/myreadyz"
            scope["method"] = "POST"
            return await self.app(scope, _receive, _send)

        return await self.app(scope, receive, send)

svc.add_asgi_middleware(CustomReadyzMiddleware)

yubozhao commented 2 years ago

@jiewpeng Did this solution works out for you? Sorry about not following up. We were busy with the 1.0 release.

jiewpeng commented 2 years ago

@yubozhao yes the solution worked, though I modified it slightly to fit what I needed. Still though, I feel such functionality should be easier to customize - this solution requires the user to fiddle around with what feels more like the internals of BentoML - if at some point BentoML changes the way its model server works, this solution may break.

aarnphm commented 2 years ago

BentoML HTTP server follows ASGI protocol, hence for most cases custom middleware should be supported.

These readiness endpoint should be stable that I don't see any huge breaking changes like this in the future.

Maybe we should have better tutorial on how to customize middleware. What do you think @yubozhao ?

ssheng commented 2 years ago

the middleware will just return HTTP 503 "BAD".

@jiewpeng, yes. the intention here was for us to customize the health check behavior. Glad to see everything worked out. On our side, we should better document health check customization.