I have launched a BentomlServer with a vllm backend on k8s.
Once the model is loaded (codellama 13B instruct in float 16), the logs of the pod are the following :
[INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening
[WARNING] [api_server:llm-llama-service:1] Timed out waiting for runner to be ready
I don't understand why this warning appears in the logs informing me that the runner is no longer ready, why the previous line tells me that the server is ready to listen.
Do you have any explanation why this warning pops up after the logs inform that the server is ready to listen?
Describe the bug
I have launched a BentomlServer with a vllm backend on k8s.
Once the model is loaded (codellama 13B instruct in float 16), the logs of the pod are the following :
[INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening [WARNING] [api_server:llm-llama-service:1] Timed out waiting for runner to be ready
I don't understand why this warning appears in the logs informing me that the runner is no longer ready, why the previous line tells me that the server is ready to listen.
Do you have any explanation why this warning pops up after the logs inform that the server is ready to listen?
Many thanks for your help
To reproduce
No response
Logs
No response
Environment
K8S Python 3.10
System information (Optional)
No response