Motivation
To enable truss auto detect vllm engine crash down and self recover through restart.
Changes
At the end of model loading step, we start a long-running background thread to probe vllm engine/server health every HEALTH_CHECK_INTERVAL seconds, and trigger os.exit(1) upon health check failure.
Testing
For standard and openai server mode, truss can auto detect vllm engine failure and restart successfully.
Use case 1: standard vllm engine mode, fake engine crash by shut down the vllm engine event loop
Use case 2: openai compatible server mode, fake server crash by kill the server process
Motivation To enable truss auto detect vllm engine crash down and self recover through restart.
Changes At the end of model loading step, we start a long-running background thread to probe vllm engine/server health every
HEALTH_CHECK_INTERVAL
seconds, and triggeros.exit(1)
upon health check failure.Testing
For standard and openai server mode, truss can auto detect vllm engine failure and restart successfully.
Use case 1: standard vllm engine mode, fake engine crash by shut down the vllm engine event loop
Use case 2: openai compatible server mode, fake server crash by kill the server process