-
# Proposed Feature
Add an efficient interface for generation probabilities on fixed prompt and completion pairs. For example:
```python
# ... load LLM or engine
prompt_completion_pairs = [
…
-
I have tried to deploy an embedding model in AWS Sagemaker endpoint using the provided guide that uses inference.py to deploy custom code. the endpoint is created and starts but when I query the end-p…
-
### System Info
text-generation-inference version 2.2.0
model "mistralai/Mixtral-8x7B-Instruct-v0.1"
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported c…
-
Hello , Thank you for sharing your work
Actually I have an issue regarding text editing. I want to run only the text editing part on some data that I have and I tried to use the inference code. Th…
-
When using: **Mistral 7b Text Completion - Raw Text training full example.ipynb**
**Last block errors with:**
`Exception in thread Thread-17 (generate):
Traceback (most recent call last):
File…
-
### System Info
A100-80GB * 4
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
### Reproduction
```shell
docker ru…
-
Dears,
I failed to run Llama-2-7b-chat-hf on NPU, please give me a hand.
1. I converted the mode by below command, and got two models,
a) optimum-cli export openvino --task text-generation -m Meta-…
-
### 相关问题
_No response_
### 可能的解决办法
仓库地址:https://github.com/huggingface/text-generation-inference
API接口文档:https://huggingface.github.io/text-generation-inference/
### 帮助开发
- [ ] 我愿意协助开发!
### 补充说…
-
Hello,
I followed the system setup instructions and tried to build the text-generation-inference container on my Jetson Orin 8GB running JetPack 5.1, but I seem to be running into the following err…
-
### Feature request
Currently a new trace is created per each HTTP request. It would be useful if trace is get (if available) from the request using traceparent header as defined in https://opentelem…