-
[This library](https://github.com/huggingface/text-generation-inference) from HF is pretty great and I get use out of it in production settings for LLMs. Would love to figure out how to integrate a sy…
-
**Is your feature request related to a problem? Please describe.**
While using the Inference API for a chatbot-style text-generation model, such as openchat-3.5, it is not possible to set an end of g…
-
### System Info
```shell
I'm compiling a fine-tuned Llama 3.1 70B model with the below system info on an inf2.48xlarge machine. I'm using neuronX TGI 0.0.25 with AWS Sagemaker. I get the below err…
-
Hi there!
Galactica [tokenizer](https://huggingface.co/facebook/galactica-120b/blob/main/tokenizer_config.json)'s `eos_token_id` is not set but it's set in its model config. We account for tokenize…
-
Hi thanks for making the code open source. I was able to run inferencing with frame length 4 on 2 A6000 GPUs. I wanted to ask how can I assign attributes and custom text prompts while in the inferenci…
-
### Priority
Undecided
### OS type
Ubuntu
### Hardware type
GPU-Nvidia
### Installation method
- [X] Pull docker images from hub.docker.com
- [ ] Build docker images from source
### Deploy met…
-
SentenceVAE/
│
├── encoder.py
│ ```python
│ import torch
│ from torch import nn
│
│ class SentenceEncoder(nn.Module):
│ '''Sentence Encoder with byte-level BPE tokenization, lear…
-
Hello, thanks for your good work! Text-generation-inference (tgi) supports the deployment of non-core model according to the official documents:
> https://huggingface.co/docs/text-generation-inferen…
-
Hi,
I'm new to Langchain and LLM.
I've recently deployed an LLM model using the Hugging Face text-generation-inference library on my local machine.
I've successfully accessed the model using …
-
### System Info / 系統信息
目前Xinference更新到0.13.3,transformers为4.42.1,GPU为4090、3090。若使用transformers引擎:
```
xinference launch --model-engine transformers -u glm4-chat -n glm4-chat -s 9 -f pytorch --max_m…