feat: support enforce_eager option from cli

Feature request

Support passing --enforce_eager to vllm from the command line like so:

openllm start repo/model --port 3000 --enforce_eager

Motivation

Since vllm 0.2.7, CUDA graph generation is on by default, which takes up to 3 Gio of VRAM in addition to the model. On my hardware, this means I trigger an Out Of Memory error.

vllm supports the argument "enforce_eager = True" to disable graph generation, but I was not able to pass this argument from the openllm command line interface.

Other

Thank you for your time !

bentoml / OpenLLM

feat: support enforce_eager option from cli #1003

Feature request

Motivation

Other