-
### Feature request
I tried to run LLama-3 on TGI (1.3). The model kind of works, but it doesn't stop at the EOS tokens. I suspect TGI doesn't "understand" Llama-3's new tokenization scheme and promp…
-
### Feature request
Currently TGI NeuronX loads the artifacts with NeuronModelForCausalLM class, which gives error when loading Flan T5.
```
Unrecognized configuration class for this kind of Aut…
-
Ability to export TGIs as CSV and/or SQLite DB for individual analysis.
-
**Acceptance criteria:**
- [x] Architecture diagram for Caikit/TGIS and ODH/RHODS
- [X] ADR
-
### System Info
ghcr.io/huggingface/text-generation-inference:2.0.4 & 2.1.0
Ubuntu 22.04 server, 8xA6000.
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially suppor…
-
### Feature request
hello,our models are deploying with TGI(v1.4.3), and we alse want to use lorax. But I find that the tgi version lorax is based on is very different with TGI version v1.4.3。
We …
-
I have updated my h2ogpt docker version and now my docker wan't start.
The error is thrown : Malformed inference server.
It is a bug because few lines before in the command line it succeeded generat…
-
I have a finetuned llama 2 7B chat model which I am deploying to an endpoint using DJL container. After deploying when I tested the model, the model output quality has degraded (The output seems to be…
-
### System Info
Tests run via dedicated endpoints and Idefics2.
TGI version was probably 2.0.2
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially suppo…
-
For 4096 token(which is forced by omost), use llama-3 model at 4090, it take 120s to complete prompt. And it take only 7s for SD. It's a big gap.
How can we accelerate the local GPT?