-
It would be beneficial to have the capability to stream tokens to a client as they are generated, similar to other text generation interfaces. From what I've observed, this feature does not seem to be…
-
## Feature request
My playlist contains a lot of internet radio stations. All stations are sending mp3 streams. In MPD all streams are noisy. A test with VLC and same sources was succesfull. What ca…
-
Code:
```
def download_pdf(url, paperId):
"""Download specified PDFs if not in repository"""
filename = "pdfs/" + paperId + ".pdf"
# Check if the file already exists
if os.pa…
-
Placeholder for prepping for the Zotero 7 release.
- [x] install.rdf → manifest.json
- [x] update.rdf → updates.json
- [x] XUL Overlays → bootstrap.js
- [x] chrome.manifest → runtime chrome regi…
-
### System Info
Docker Image: ghcr.io/huggingface/text-generation-inference:sha-1734540
Instance: AWS A10G via Huggingface Interfence Endpoint
### Information
- [X] Docker
- [ ] The CLI directly
…
-
# Weekly GitHub Trending! (2024/04/22 ~ 2024/04/29)
## Python trending 11repo's
### [meta-llama](https://github.com/meta-llama) / [llama3](https://github.com/meta-llama/llama3)
公式 Meta Llama 3 GitHub …
ivgtr updated
1 month ago
-
If streaming is enabled the generation slows down significantly.
The following script:
```python
Runs = 4
def request(stream:bool):
client = openai.Client(api_key="foobar", base_url=EN…
-
The preferred way to run models is to stand up an inference server (e.g., Triton + TensorRT or vLLM or TGI) locally and then hit it from HELM as an API. This way, HELM can benefit from all the crazy …
-
### Feature request
I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains…
-
### System Info
docker image 1.3.0
public runpod template: https://runpod.io/gsc?template=3uvdgyo0yy&ref=jmfkcdio
### Information
- [X] Docker
- [ ] The CLI directly
### Tasks
- [X] An offic…