-
### Review Mojo's priorities
- [X] I have read the [roadmap and priorities](https://docs.modular.com/mojo/roadmap.html#overall-priorities) and I believe this request falls within the priorities.
###…
-
### Feature request
This framework use an advance prefix cache technique to accelerate inference. It brings more than 30% improvement compare to TGI without speculate. Is it possible to integrated in…
-
### Feature request
An increasingly common question is how to support inference for multiple LoRA models running against a single backbone model. What's preventing TGI from implementing a feature lik…
-
please how to support gemma model
-
### System Info
tgi 1.3
ubuntu 18
python 3.10
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
### Reproduction
C…
-
### Feature request
I can't find any guidance on integrating HuggingFace TGI and AWS Inferentia.
I've found several documents about deployment guides for individual end-to-end models, but I don't se…
-
I tried to quantify model **Llama-2-13b-hf** using bitsandbytes, but I found that int4 inference performance is lower than fp16 inference, whether it is in A100 or 3090.
Can you tell me why and how …
-
### System Info
Thank you for adding support for Medusa. In my comparison of Medusa models versus the original base models with TGI, the latter appeared to be quicker.
I tested the below models:…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a…
-
Current process
1. Issues in backlog: Every new issue, issues that are not under development but are being tracked for various reasons (for ex: issues that are (1) interesting but no resources ava…