-
I'm using TGI with Flan-T5 to process thousands of text extraction requests at a time, on a 4 x A6000 machine. My client class, which uses `AsyncInferenceClient`, can handle 900 requests at once, but …
-
### Feature request
Extend the current API to accept `hidden_states` as an optional input parameter in addition to `input_ids`. This would support integration with multi-modality models such as LLA…
-
### System Info
`ghcr.io/huggingface/text-generation-inference:2.0.3`
```
Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.75.0
Commit sha: 6073ece4fc2d7180c2057cb49b9…
tfcoe updated
1 month ago
-
When trying to load a model in a Pod running with a memory limit too low, the out-of-memory error message is swallowed by TGIS and hard to troubleshoot (in addition to [Caikit swallowing the TGIS erro…
-
### Feature request
I've only just discovered TGI but do not see in the docs how function calling is supported and in my testing seems to confirm same. I think the addition of function calling suppor…
-
### How to use GitHub
* Please use the 👍 [reaction](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to show that you are interested into the same f…
-
i am testing the hugging face preview package for .NET, but i cannot make use of `IChatCompletionService`
it fails to resolve that, not using openai, only huggingface apis..
```fshar…
-
Following up on #64, here would be the practical implementation I had in mind:
---
Imagine the following scenario:
You receive yet another GitHub issue (email) with a support request in disguise - …
-
There are a large number (over 60,000) of records which use obsolete language codes which makes them hard to search for and also causes the languages to not have translated labels. These records shoul…
-
Mixtral Instruct AWQ vLLM API by Trelis
vllm/vllm-openai:latest
Runpod:
1 x A100 80GB
16 vCPU 125 GB RAM
50 GB Disk
150 GB Pod Volume
Container log fills with these errors:
2024-01-23T03:2…