-
How do we deploy this model via API? Can I deploy it on vLLM or lmdeploy? I can't find any example to run this with HuggingFace transformers.
I want to deploy 72b and 110b model
-
Do you still working on tgis project
-
### System Info
TGI Version: Tried 2.0.3, 2.0.4, 2.1.1 all does not work but 2.0.2 works
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ]…
-
### Model description
Please add support for HuggingFaceM4/Idefics3-8B-Llama3 in tgi:
_Idefics3 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces t…
-
在使用TGI本地部署并通过text-generation-launcher运行模型时,会一直报错:Server error: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
![image](https://github.com/WisdomShel…
-
### Feature request
Llama 3.1 is out and should be compatible with Neuron, however, it requires `transformers==4.43.1`, and `optimum-neuron` has pinned `transformers` to `4.41.1`.
Notes that sin…
-
OS type
Ubuntu
Description
When running the example Translation using Docker Compose, one of the images takes additional time to pull a model from the Huggingface upon startup. During this period…
-
### Presentation of the new feature
Logits processors in outlines.processors support nearly every inference engine, offering a "write once, run anywhere" implementation of business logic.
Curren…
lapp0 updated
2 months ago
-
Request
The ask is to introduce a openai text generation API compatibility layer (chat completion endpoint) to kserve/TGIS.
Why
Having an openai API compatibility layer will allow more open sourc…
-
For 4096 token(which is forced by omost), use llama-3 model at 4090, it take 120s to complete prompt. And it take only 7s for SD. It's a big gap.
How can we accelerate the local GPT?