Support vllm openai api server

Per https://docs.vllm.ai/en/latest/serving/metrics.html, openai api server supports vLLM serving metrics by default. This PR therefore:

updates api server from vanilla to openai mode
adds swap_space argument suggested in vLLM benchmarks

e2e tests with model meta-llama/Llama-2-7b-chat-hf. After terraform apply:

# Get vLLM LB's external IP
$ VLLM_EXTERNAL_IP=`kubectl -n benchmark get service vllm -o jsonpath='{.status.loadBalancer.ingress[0].ip}'`

# send a prompt to the endpoint
$ curl $VLLM_EXTERNAL_IP/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta-llama/Llama-2-7b-chat-hf",
        "prompt": "Seattle City is a",
        "max_tokens": 7,
        "temperature": 0
    }'

# Check prometheus metrics
$ curl $VLLM_EXTERNAL_IP/metrics/

...
# TYPE vllm:prompt_tokens_total counter
vllm:prompt_tokens_total{model_name="meta-llama/Llama-2-7b-chat-hf"} 9.0
...

GoogleCloudPlatform / ai-on-gke

Support vllm openai api server #694