-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
-
```docker run -it --name vllm_service -p 8008:80 -e HF_TOKEN=${HUGGINGFACEHUB_API_TOKEN} -v ./data:/data vllm:cpu /bin/bash -c "cd / && export VLLM_CPU_KVCACHE_SPACE=40 && python3 -m vllm.entrypoints.…
-
- [ ] [classifiers/README.md at main · blockentropy/classifiers](https://github.com/blockentropy/classifiers/blob/main/README.md?plain=1)
# classifiers/README.md
## Fast Classifiers for Prompt Rout…
-
I converted the models to `float32` using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084, but found precision problems when targeting `float16`. It'd be interesting to s…
-
hello, i have 15 cpu, 30gb ram server
when i am trying to run ingest.py cpu mod, i am reciving error like this
```
Exception: A process in the process pool was terminated abruptly while the futur…
-
The model used for ChatQnA supports BFLOAT16, in addition to TGI's default 32-bit float type: https://huggingface.co/Intel/neural-chat-7b-v3-3
TGI memory usage halves from 30GB to 15GB (and also it…
-
`
model_name = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = Model()
model.init(model_name, use_quant=True, weight_dtype="in…
-
Using the latest versions of retriever-usvc, we'll see curl failed with error code `52` with the following command:
```bash
kubectl -n test-app-chatqna exec client-test-7b7f97ddd9-fjdvv -- curl ht…
-
- [ ] [RichardAragon/MultiAgentLLM](https://github.com/richardaragon/multiagentllm)
# RichardAragon/MultiAgentLLM
**DESCRIPTION:** "Multi Agent Language Learning Machine (Multi Agent LLM)
(Update)…