-
### Describe the bug
When using some models not served via TGI (e.g., `google/flan-t5-large`), the generation hangs indefinitely.
(Related: https://github.com/deepset-ai/haystack/issues/6816)
…
-
Thank you for this amazing tool.
I intend to utilize this tool to transform my PDF data into a fine-tuning dataset for LLAMA-7B. Presently, when utilizing TGI version 1.2.0 for the inference API, t…
-
This is a ticket to track a wishlist of items you wish LiteLLM had.
# **COMMENT BELOW 👇**
### With your request 🔥 - if we have any questions, we'll follow up in comments / via DMs
Respond …
-
### System Info
- AWS `sagemaker` 2.163.0
- g5.12xlarge instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [ …
-
**Submitting author:** @lexcomber (Alexis Comber)
**Repository:** https://github.com/lexcomber/stgam
**Branch with paper.md** (empty if default branch): description
**Version:** 0.0.1.1
**Editor:** Pe…
-
Subscribe to this issue and stay notified about new [weekly trending repos in Jupyter Notebook](https://github.com/trending/jupyter-notebook?since=weekly).
-
### System Info
serverless inference endpoints
### Information
- [ ] Docker
- [X] The CLI directly
### Tasks
- [X] An officially supported command
- [ ] My own modifications
### Reproduction
Qu…
-
Locally hosted models will perform different and wont necessarily have a strict limit. Maybe in this setting, we send requests sequentially and not async, i.e. utilise the `query` method of the model …
-
/kind bug
**What steps did you take and what happened:**
[A clear and concise description of what the bug is.]
Create inferenceservice with reference to a servingruntime that defines a container …
-
### System Info
Hi, I am working on a model that uses ChatML format which has:
- `\n` at the end of its response.
- `` as the `eos_token`
# When finish reason is `eos_token`, output is correct…