-
### Feature request
I suggest better educating developers how to download and optimize the model at build time (in container or in a volume) so that the command `text-generation-launcher` serves as f…
-
after I ran finetune script, It save the adapter weight
how can I run it with vLLM or TGI to run if efficiently and fast ?
-
### System Info
[tgi-gaudi](https://github.com/huggingface/tgi-gaudi) V1.2.
Error: Traceback (most recent call last):
File "/usr/local/bin/text-generation-server", line 8, in
sys.exit(a…
-
# 软件版本
- [Umi-OCR_Rapid_dev_20231114.7z](https://github.com/hiroi-sora/Umi-OCR_v2/releases/download/dev%2F20231114/Umi-OCR_Rapid_dev_20231114.7z)
运行环境
- Ubuntu20.04
- wine-8.0.2
# 如图
…
-
I would like to understand the differences between this optimum-neuron and [transformers-neuronx](https://github.com/aws-neuron/transformers-neuronx).
-
### Feature request
Flags for inference enrich the output with explainability information or suppress specific input token/embedding spaces, as described [here](https://github.com/Aleph-Alpha/AtMan).…
-
*Problem*
After building images using Dockerfiles in this repository, according to instructions here: https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker-composer/xeon/README.md
…
-
### System Info
Using ghcr.io/huggingface/text-generation-inference:latest, but same issue with 0.9, and 1.02.
Trying to deploy with model_id "tiiuae/falcon-7b-instruct"
### Information
- [X] Dock…
-
### The Feature
TGI supports a truncate param for handling scenarios where max tokens > model limit
support it
### Motivation, pitch
user request
### Twitter / LinkedIn details
_No response_
-
### System Info
tgi version - latest . The model is cohere aya 35B, 4bit bnb quantized model . Originally I quantized base model and merged finetuned adapters with it.
### Information
- [X] …