-
### System Info
```bash
gpu=0
num_gpus=1
model=meta-llama/Meta-Llama-3.1-8B-Instruct
docker run -d \
--gpus "\"device=$gpu\"" \
--shm-size 16g \
-e HUGGING_FACE_HUB_TOKEN=$token \
-p 8082:80 …
-
**Task - Flash Attention Installation from Source. [Completed]
Run- TGI2.3.1 with models that support for Flash attention enabled models.** [Issue does not occur in TGI2.2.0]
**Error -**
2024-1…
-
### 🚀 The feature, motivation and pitch
Request for dynamic download of LoRA adapters from S3 or HF Hub based on what `model` adapter id is passed in the request.
### Alternatives
No alternatives …
-
It is a great plugin and I love it. But I found an error here.
```
[LLM] http error: error sending request for url (http://localhost:11434/api/generate): connection closed before message completed
…
-
The speed difference is astounding compared to https://huggingface.co/chat/ when running llama2-70b-chat.
I wonder what I am doing wrong. I have A100 gpus, but the maximum on a single node are 4, …
-
Please add Qwen2 support
```
EETQ_CAUSAL_LM_MODEL_MAP = {
"llama": LlamaEETQForCausalLM,
"baichuan": BaichuanEETQForCausalLM,
"gemma": GemmaEETQForCausalLM
}
```
-
-
When run the comps/llms/summarization/tgi/langchain docker container, when pass the "streaming": false parameter in the curl request:
`curl http://${your_ip}:9000/v1/chat/docsum -X POST -d '{"q…
-
Got a work-in-progress brewing for a Plus/4 port of PlatoTerm, though it is contingent on a couple of bugfixes in VICE and CC65.
* PlatoTerm code is in the [port/plus4 branch](https://github.com/rh…
-
**Snakemake version**
7.10.0
**Describe the bug**
When gurobipy is installed in the conda environment snakemake will use pulp to check whether a license is available before running your snakema…