dstackai / dstack

dstack is an open-source alternative to Kubernetes, designed to simplify development, training, and deployment of AI across any cloud or on-prem. It supports NVIDIA, AMD, and TPU.
https://dstack.ai/docs
Mozilla Public License 2.0
1.32k stars 98 forks source link

Ensure the gateway endpoint works with the `/v1` URL path prefix #1546

Open peterschmidt85 opened 1 month ago

peterschmidt85 commented 1 month ago

Steps to reproduce:

  1. Run a service with model mapping using openai format

Example:

type: service
name: llama31-service-tgi

replicas: 1..2
scaling:
  metric: rps
  target: 30

volumes:
 - name: llama31-volume
   path: /data

image: ghcr.io/huggingface/text-generation-inference:latest
env:
  - HUGGING_FACE_HUB_TOKEN
  - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
  - MAX_INPUT_LENGTH=4000
  - MAX_TOTAL_TOKENS=4096
commands:
  - text-generation-launcher
port: 80

spot_policy: auto

resources:
  gpu: 24GB

model:
  format: openai
  type: chat
  name: meta-llama/Meta-Llama-3.1-8B-Instruct

Actual behaviour:

  1. Run endpoint with /v1: Access https://<run name>.<gateway domain>/v1/chat/completions. It works
  2. Gateway endpoint without /v1: Access https://gateway.<gateway domain>/chat/completions. It work
  3. Gateway endpoint with /v1: Access https://gateway.<gateway domain>/v1/chat/completions. It doesn't work

Expected behaviour:

  1. Gateway endpoint with /v1 works with and without /v1 (similar to the behavior of OpenAI)
github-actions[bot] commented 5 days ago

This issue is stale because it has been open for 30 days with no activity.