Open peterschmidt85 opened 1 month ago
Steps to reproduce:
openai
Example:
type: service name: llama31-service-tgi replicas: 1..2 scaling: metric: rps target: 30 volumes: - name: llama31-volume path: /data image: ghcr.io/huggingface/text-generation-inference:latest env: - HUGGING_FACE_HUB_TOKEN - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct - MAX_INPUT_LENGTH=4000 - MAX_TOTAL_TOKENS=4096 commands: - text-generation-launcher port: 80 spot_policy: auto resources: gpu: 24GB model: format: openai type: chat name: meta-llama/Meta-Llama-3.1-8B-Instruct
Actual behaviour:
/v1
https://<run name>.<gateway domain>/v1/chat/completions
https://gateway.<gateway domain>/chat/completions
https://gateway.<gateway domain>/v1/chat/completions
Expected behaviour:
This issue is stale because it has been open for 30 days with no activity.
Steps to reproduce:
openai
formatExample:
Actual behaviour:
/v1
: Accesshttps://<run name>.<gateway domain>/v1/chat/completions
. It works/v1
: Accesshttps://gateway.<gateway domain>/chat/completions
. It work/v1
: Accesshttps://gateway.<gateway domain>/v1/chat/completions
. It doesn't workExpected behaviour:
/v1
works with and without/v1
(similar to the behavior of OpenAI)