-
### Feature request
TGI provides some valuable metrics on model performance and load today. However, there are still a number of missing metrics, the absence of which poses a challenge for orchestr…
-
I've been testing running various finetuned versions of supported models on GKE. However, it gets stuck on ` Using the Hugging Face API to retrieve tokenizer config`
This are the full logs
```…
-
### System Info
Hi, on inference endpoint in huggingface the TGI for classifiers is working but here it doesn't, Deberta v3 classifier is not supported?
### Information
- [ ] Docker
- [X] The CLI d…
-
### System Info
Running a TGI 2.0.3 docker on a 8 NVIDIA_L4 VM with 1 L4 exposed to docker.
Command:
```sh
MODEL=google/gemma-2b-it
docker run \
-m 320G \
--shm-size=40G \
-e NVIDIA_VI…
-
### Feature request
Right now the stop logic on TGI supports stopping on tokens, OpenAI is more flexible as it can stop on sub-tokens and sequences of sub-tokens
For example, comparing llama-3-8b-…
-
Hi! I wanted to ask for support for [TGI](https://github.com/huggingface/text-generation-inference) as a provider.
I can probably work on this later this week.
-
### System Info
uname -a
Linux a3eb1d6a21b4 5.4.0-174-generic #193-Ubuntu SMP Thu Mar 7 14:29:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
cargo --version
cargo 1.78.0 (54d8815d0 2024-03-26)
…
-
### Feature Description
Hi,
I was hoping to request a feature to allow access to the raw response generated by the LLM in the StreamingAgentChatResponse that is returned by any of llama_indexes ch…
-
### Feature request
Using CORS_ALLOW_ORIGIN I was able to set single origin to avoid CORS issues.
`CORS_ALLOW_ORIGIN=123.45.67.8:1234`
How do I set it up for multiple origins?
### Motivation
…
-
*Concise Description:*
I deployed Llama-3-8B-Instruct on Sagemaker using the latest container. When inferencing, the model does not stop generating tokens.
*DLC image/dockerfile:*
763104351884.dk…