-
Someone approached me on IRC and asked about how to use text output in TGI. Should be easy enough - i thought. Until i tried to add some simple text output to the existing TGI sample. Apparently not o…
-
Hi - really interesting work. We're currently using HF TGI in production and exploring using this instead, are there plans to add things like typical_p that transformers supports? Would greatly ease t…
-
### Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a sim…
-
### Feature request
Support the recent larger embedding models of 7B or more parameters (20x larger than BERT-large)
### Motivation
The embedding models are being much larger than before in the pas…
ai-jz updated
5 months ago
-
Greetings, @cipher982!
Currently we are working on the Openvino inference framework, and such benchmarks are critical to understand gaps and differences between our framework and Transformers/ TGI …
-
Request
The ask is to introduce a openai text generation API compatibility layer (chat completion endpoint) to kserve/TGIS.
Why
Having an openai API compatibility layer will allow more open sourc…
-
Based on practical tests, deploying omost-llama-3-8b on an A100 using torch==2.3.0+cu118, vllm==0.5.0.post1+cu118, and xformers==0.0.26.post1+cu118 works well. if want to speed up the process, can ref…
-
**Describe the bug**
When changing the mapset inside a Python script, the temporal framework does not take this change into account and fail to connect to the temporal database, even thought GRASS re…
-
### Feature request
Llama 3.1 is out and should be compatible with Neuron, however, it requires `transformers==4.43.1`, and `optimum-neuron` has pinned `transformers` to `4.41.1`.
Notes that sin…
-
Do you have streaming functionality for auto-regressive LLMs? Something similar to Huggingface TGI for example.