basetenlabs / truss-examples

Examples of models deployable with Truss
https://trussml.com
MIT License
130 stars 37 forks source link

vllm health check #338

Closed tianshuc0731 closed 1 month ago

tianshuc0731 commented 1 month ago

Motivation To enable truss auto detect vllm engine crash down and self recover through restart.

Changes At the end of model loading step, we start a long-running background thread to probe vllm engine/server health every HEALTH_CHECK_INTERVAL seconds, and trigger os.exit(1) upon health check failure.

Testing

For standard and openai server mode, truss can auto detect vllm engine failure and restart successfully.

Use case 1: standard vllm engine mode, fake engine crash by shut down the vllm engine event loop

Screenshot 2024-08-19 at 7 35 50 PM

Use case 2: openai compatible server mode, fake server crash by kill the server process

Screenshot 2024-08-19 at 6 39 20 PM