Closed jiewpeng closed 2 years ago
@jiewpeng That's a great suggestion. We are planning on introducing more capability that helps users to have the confidence when the server started.
I think provide custom function for readiness is a good way to go. Let me check with the team and we will update discussion on this issue
@jiewpeng Are you doing any model validation during training and before push into production? I would love to learn more about your context and situation
We do have validation for some of our models before deploying them, however this is not able to fully simulate the actual BentoML environment since we directly call the model instance (e.g. sklearn model, pytorch model) with various inputs when testing, so
runner.run
or runner.run_batch
, and how the output of these functions get slicedHi @jiewpeng, maybe a simpler solution is to add an ASGI middleware to intercept the /readyz
request and inject your custom validation logic.
In your service definition module, you can add an ASGI middleware by calling svc.add_asgi_middleware()
.
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import PlainTextResponse
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
class CustomReadyzMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
if request.url == "http://127.0.0.1:3000/readyz":
...
return PlainTextResponse("Not ready", status_code=503)
return await call_next(request)
svc.add_asgi_middleware(CustomReadyzMiddleware)
@ssheng thanks for the suggestion; unfortunately this method did not allow me to use the runner nor the logger - or am I doing something wrong? This is my service.py file for the iris clf example in the bentoml tutorial. Calling /readyz
, I do not see any logs at all, but if I print the exceptions e.g. by using traceback.print_exc()
, I can see in the logs RuntimeError: This function can only be run from an AnyIO worker thread
when it tries to use the runner,
import bentoml
from bentoml.io import NumpyNdarray
import logging
import numpy as np
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.responses import JSONResponse
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
class CustomReadyzMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request, call_next):
logging.info(f"Requested url: {request.url.path}")
if request.url.path.endswith("/readyz"):
try:
# deliberately wrong input just to force an error
data = [[0.1, 0.1, 0.1]]
iris_clf_runner.run(data)
except Exception as e:
return JSONResponse(
{"status": "Not OK", "detail": str(e)}, status_code=503
)
return JSONResponse({"status": "OK"}, status_code=200)
return await call_next(request)
svc.add_asgi_middleware(CustomReadyzMiddleware)
However, this service file works, though it does it by mounting a fastapi app and exposing the readyz on another route - not sure what are the performance implications of this though. Also in this method, the api server log is disconnected from the trace ID for the custom readyz route.
import bentoml
from bentoml.io import NumpyNdarray
from fastapi import FastAPI, HTTPException, status
import logging
import numpy as np
fastapi_app = FastAPI()
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
@fastapi_app.get("/custom-readyz", status_code=status.HTTP_200_OK)
def ready():
# deliberately wrong input just to force an error
data = [[0.1, 0.1, 0.1]]
try:
iris_clf_runner.run(data)
except Exception as e:
logging.exception(e)
raise HTTPException(status_code=503, detail=str(e))
return {"status": "OK"}
svc.mount_asgi_app(fastapi_app)
@jiewpeng To build on top of Sean's suggestion, I think you can use the middleware to redirect the request to /readyz
to a service endpoint that does the validation, that way you will have access to the runners.
from starlet the.requests import Request
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
@svc.api(input=JSON(), output=JSON())
def myreadyz(input):
try:
Data = [[0.1, 0.1, 0.1]]
iris_clf_runner.run(data)
except Exception as e:
return JSONResponse(
{"status": "Not OK", "detail": str(e)}, status_code=503
)
return JSONResponse({"status": "OK"}, status_code=200)
class CustomReadyzMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
req = Request(scope, receive)
# redirecting request to /readyz to your custom /myreadyz
if req.url.path == '/readyz':
scope['method'] = 'POST'
scope['path'] = '/myreadyz'
await self.app(scope, receive, send)
svc.add_asgi_middleware(CustomReadyzMiddleware)
Hi @yubozhao , thanks for the suggestion, but this method is not working.
From the logs, I can see that because the myreadyz
(which I call readyz_predict
) expects a JSON input but does not receive one, so it fails.
2022-07-04T01:29:06+0000 [ERROR] [api_server:1] Exception on /readyz_predict [POST] (trace=81251819796244078624966248506590491305,span=1586327102725891898,sampled=0)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 287, in api_func
input_data = await api.input.from_http_request(request)
File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 184, in from_http_request
raise BadInput(f"Json validation error: {e}") from None
bentoml.exceptions.BadInput: Json validation error: Expecting value: line 1 column 1 (char 0)
Additionally, when I SSH'd into the pod and called requests.post("http://localhost:3000/readyz_predict", json="1")
to simulate sending a dummy JSON input, I can avoid this error, but then I get to the next error, which is that the bentoml service does not know how to return the JSONResponse
, as it tries to use json.dumps()
on it, which doesn't work.
2022-07-04T01:26:48+0000 [ERROR] [api_server:1] Exception on /readyz_predict [POST] (trace=229707093816526215582168499431547298555,span=6784491415553690799,sampled=0)
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 314, in api_func
response = await api.output.to_http_response(output, ctx)
File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 187, in to_http_response
json_str = json.dumps(
File "/usr/local/lib/python3.8/json/__init__.py", line 234, in dumps
return cls(
File "/usr/local/lib/python3.8/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/local/lib/python3.8/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/usr/local/lib/python3.8/site-packages/bentoml/_internal/io_descriptors/json.py", line 57, in default
return super().default(o)
File "/usr/local/lib/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type JSONResponse is not JSON serializable
@jiewpeng sorry about my hasty response. You will need to pass an empty json payload as part of the redirect and also instead of using JSON for output, you can use Text
and return an empty string for successful validation
With the following service file, it still does not work:
import bentoml
from bentoml.io import JSON, NumpyNdarray, Text
import logging
import numpy as np
from starlette.requests import Request
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
@svc.api(input=JSON(), output=Text())
def myreadyz(input):
data = [[0.1, 0.1, 0.1]]
iris_clf_runner.predict.run(data)
return ""
class CustomReadyzMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
req = Request(scope, receive)
# redirecting request to /readyz to your custom /myreadyz
if req.url.path == "/readyz":
scope["method"] = "POST"
scope["path"] = "/myreadyz"
message = {
"type": "http.request",
"body": "1".encode("utf-8"),
"more_body": False,
}
async def create_message():
return message
await self.app(scope, create_message, send)
svc.add_asgi_middleware(CustomReadyzMiddleware)
When I try calling /readyz
, it says the runner is not initialized.
2022-07-04T03:53:46+0000 [ERROR] [api_server:7] Exception on /myreadyz [POST] (trace=300799302259605274778301312933085049161,span=16463248736953569975,sampled=0)
Traceback (most recent call last):
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/server/service_app.py", line 312, in api_func
output = await run_in_threadpool(api.func, input_data)
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
return await anyio.to_thread.run_sync(func, *args)
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/home/vscode/bentoml/bentos/iris_classifier/4kypzvh3jswgaasc/src/service.py", line 42, in myreadyz
iris_clf_runner.predict.run(data)
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/runner/runner.py", line 40, in run
return self.runner._runner_handle.run_method( # type: ignore
File "/home/vscode/.cache/pypoetry/virtualenvs/js-databricks-p8qe_C75-py3.8/lib/python3.8/site-packages/bentoml/_internal/runner/runner_handle/__init__.py", line 58, in run_method
raise StateException("Runner is not initialized")
bentoml.exceptions.StateException: Runner is not initialized
2022-07-04T03:53:46+0000 [INFO] [api_server:7] 127.0.0.1:49430 (scheme=http,method=POST,path=/myreadyz,type=,length=) (status=500,type=application/json,length=110) 0.006ms (trace=300799302259605274778301312933085049161,span=16463248736953569975,sampled=0)
In addition, even if it ran without issues, it would still be not possible to return something meaningful e.g. the error itself (though this is secondary)
can you make sure to update bentoml to latest rc3 release?
Yup I upgraded to the rc3 release yesterday
@jiewpeng I got a different error than the one you posted. However, I was able to get the following example working. We can setup an office hours with you if you continue to experience issues. Please let us know.
import bentoml
from bentoml.io import JSON, NumpyNdarray, Text
import numpy as np
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
@svc.api(input=JSON(), output=Text())
def myreadyz(input):
data = [[0.1,0.1,0.1,0.1]]
result = iris_clf_runner.predict.run(data)
...
return ""
class CustomReadyzMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
async def _send(message):
if message["type"] == "http.response.start":
message = {
"type": "http.response.start",
"status": 503,
"headers": [
[b"content-type", b"text/plain"],
[b"content-length", b"3"],
],
}
elif message["type"] == "http.response.body":
if "more_body" in message and message["more_body"]:
return
else:
message = {
"type": "http.response.body",
"body": b"BAD",
"more_body": False,
}
await send(message)
async def _receive():
await receive()
message = {
"type": "http.request",
"body": b"[[5, 4, 3, 2]]",
"more_body": False,
}
return message
if "path" in scope and scope["path"] == "/readyz":
scope["path"] = "/myreadyz"
scope["method"] = "POST"
return await self.app(scope, _receive, _send)
return await self.app(scope, receive, send)
svc.add_asgi_middleware(CustomReadyzMiddleware)
Proof
✗ curl -X GET -v http://127.0.0.1:3000/readyz
Note: Unnecessary use of -X or --request, GET is already inferred.
* Trying 127.0.0.1:3000...
* Connected to 127.0.0.1 (127.0.0.1) port 3000 (#0)
> GET /readyz HTTP/1.1
> Host: 127.0.0.1:3000
> User-Agent: curl/7.77.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Unavailable
< date: Tue, 05 Jul 2022 09:24:11 GMT
< server: uvicorn
< content-type: text/plain
< content-length: 3
<
* Connection #0 to host 127.0.0.1 left intact
BAD%
@ssheng thanks, your code works, however I don't really see the point of the if message["type"] == "http.response.start"
chunk...perhaps I don't understand bentoml's internals well enough. Your code also defeats the purpose of the readiness probe logic, because no matter what happens inside the function, the middleware will just return HTTP 503 "BAD".
I have modified your snippet slightly to look like this. In this case, the readiness probe can return something useful, and we can return the correct status code based on whether or not an exception has occurred within the myreadyz
function.
import bentoml
from bentoml.io import NumpyNdarray, JSON, Text
import json
import logging
import numpy as np
import traceback
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
@svc.api(input=JSON(), output=Text())
def myreadyz(input):
try:
data = [[0.1, 0.1, 0.1, 0.1]]
iris_clf_runner.predict.run(data)
return json.dumps({"status": "OK"})
except Exception as e:
logging.exception(e)
return json.dumps(
{
"status": "Not OK",
"error": repr(e),
"traceback": traceback.format_exc(),
}
)
class CustomReadyzMiddleware:
def __init__(self, app):
self.app = app
async def __call__(self, scope, receive, send):
async def _send(message):
if message["type"] == "http.response.body":
message["status"] = 200 if b'"status": "OK"' in message["body"] else 503
await send(message)
async def _receive():
await receive()
message = {
"type": "http.request",
"body": b'{"dummy": "input"}',
}
return message
if "path" in scope and scope["path"] == "/readyz":
scope["path"] = "/myreadyz"
scope["method"] = "POST"
return await self.app(scope, _receive, _send)
return await self.app(scope, receive, send)
svc.add_asgi_middleware(CustomReadyzMiddleware)
@jiewpeng Did this solution works out for you? Sorry about not following up. We were busy with the 1.0 release.
@yubozhao yes the solution worked, though I modified it slightly to fit what I needed. Still though, I feel such functionality should be easier to customize - this solution requires the user to fiddle around with what feels more like the internals of BentoML - if at some point BentoML changes the way its model server works, this solution may break.
BentoML HTTP server follows ASGI protocol, hence for most cases custom middleware should be supported.
These readiness endpoint should be stable that I don't see any huge breaking changes like this in the future.
Maybe we should have better tutorial on how to customize middleware. What do you think @yubozhao ?
the middleware will just return HTTP 503 "BAD".
@jiewpeng, yes. the intention here was for us to customize the health check behavior. Glad to see everything worked out. On our side, we should better document health check customization.
Is your feature request related to a problem? Please describe. Currently, the readiness probe does not bother to check anything; it just returns 200 OK if the app is started up. However, in case the developer accidentally introduces a bug into the bento/service file when modifying it, the deployment would be marked as ready, when it should not be. This makes a bugged deployment replace a previously working one, resulting in downstream failures.
Describe the solution you'd like Allow the user to customize the readiness probe / readyz behaviour with a custom function, for instance to call the model with a known valid input, and assert that the model returns a valid output, before marking the deployment as ready. This would also allow developers to assert that connections to external resources such as a feature store are working correctly, before marking a deployment as ready to accept connections.
Describe alternatives you've considered The developer can create a new route that is not called
/readyz
(since it is reserved) to perform these checks and then modify the Kubernetes deployment to use this route as the readiness probe. Not sure whether this is compatible with Yatai since I do not use it.Additional context None