bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
9.89k stars 629 forks source link

bug: [WARNING] [api_server:llm-llama-service:3] Timed out waiting for runner to be ready #853

Closed mfournioux closed 3 months ago

mfournioux commented 9 months ago

Describe the bug

I have launched a BentomlServer with a vllm backend on k8s.

Once the model is loaded (codellama 13B instruct in float 16), the logs of the pod are the following :

[INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening [WARNING] [api_server:llm-llama-service:1] Timed out waiting for runner to be ready

I don't understand why this warning appears in the logs informing me that the runner is no longer ready, why the previous line tells me that the server is ready to listen.

Do you have any explanation why this warning pops up after the logs inform that the server is ready to listen?

Many thanks for your help

To reproduce

No response

Logs

No response

Environment

K8S Python 3.10

System information (Optional)

No response

bojiang commented 3 months ago

close for openllm 0.6