-
Hi
As I tried with 13b version in TGI, it works fine with bitsandbytes quantization.
While trying with AWQ quantization in TGI, it shows error as "Cannot load 'awq' weight, make sure the model is al…
-
I recently try to build TGI 2.0.1 again but encounter new error
```
Installed /server9/cbj/programming/anaconda3/envs/tgi_server/lib/python3.11/site-packages/typer-0.12.3-py3.11.egg
error: h11 0.…
-
We require sample code and tutorial for running LLaMa 2 with TGI
-
Speculative Sampling is a technique to improve throughput of LLMs and customers have requested it be supported with Inf2
-
### System Info
TGI docker image on GCP.
GPU: A100
Model: Phi-3
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications…
-
### Model description
https://github.com/huggingface/text-generation-inference/pull/1709
Since the TGI has done the LlaVa support. Would like to know if there is any timeline for the LlaVa support o…
-
To allow multi-tenancy inferences
- [ ] Explore vLLM/HuggingFace TGI
- [ ] Fallback implement baseline FastAPI with batch processing
-
- model should be loaded
- more?
@Xaenalt
-
Hi, here is the INC team from Intel. Thank you for developing this amazing project.
### Motivation
Our team has developed a new weight-only quantization algorithm called Auto-Round. It has achie…
-
"text_generation_launcher: Method Warmup encountered an error." when the final stage .
```
2024-03-30T14:16:55.598106565Z 2024-03-30T14:16:55.597709Z ERROR warmup{max_input_length=3000 max_prefil…