bentoml / BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
https://bentoml.com
Apache License 2.0
7.15k stars 791 forks source link

API server SLOs #2147

Closed parano closed 1 year ago

parano commented 2 years ago
ssheng commented 2 years ago

Should we implement max latency similar to the deadline feature in gRPC or have a 10 max latency PER runner?

chris-aeviator commented 2 years ago

Happy to test the implementation down the road & provide feedback. I have a 100% reproducible situation where I run into timeouts even though the code runs fine (I see my result in the terminal)

nadworny commented 1 year ago

hey @parano / @bojiang , is the timeout config already implemented? I don't see it in the bentoml serve (1.0.16) yet... is it possible to pass this config somehow differently to the container?

found it, thx: https://docs.bentoml.org/en/latest/guides/configuration.html docker run -e BENTOML_CONFIG_OPTIONS='runners.timeout=3600' -it --rm -p 3000:3000 your_service serve --production

frostming commented 1 year ago

Yes, but the timeout config on app doesn't work currently. We will work on improving this. Thank you.