-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing…
Lyzin updated
2 months ago
-
### What is the issue?
The main_gpu option is not working as expected.
My system has two GPUs. I've sent the request to `/api/chat`
```
{
"model": "llama3.1:8b-instruct-q8_0",
"message…
-
### What behavior of the library made you think about the improvement?
This issue is just meant as a Q&A, as I couldnt find anything specifically on this.
The question is why there is a dependenc…
-
### 提交前必须检查以下项目
- [X] 请确保使用的是仓库最新代码(git pull)
- [X] 已阅读[项目文档](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki)和[FAQ章节](https://github.com/ymcui/Chinese-LLaMA-Alpaca-3/wiki/常见问题)并且已在Issue中对问题进行了搜…
-
A recent [paper](https://arxiv.org/pdf/2309.17453.pdf) by Meta/MIT/CMU proposed [StreamingLLM](https://github.com/mit-han-lab/streaming-llm/), a simple yet efficient solution to enable "infinite" cont…
-
I’ve discovered a performance gap between the Neural Speed Matmul operator and the Llama.cpp operator in the Neural-Speed repository. This issue was identified while running a benchmark with the ONNXR…
-
I installed ```llama-cpp-python``` on the system with:
**CPU AMD EPYC 7542**
**GPU V100**
But it raised the exception shown in the image below:
-
Does not start with the Llama 3.1 model. Is it possible to make changes to work with Llama 3.1? This is now the model with the most tokens and will potentially be used everywhere.
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
I want to deploy it via ollama, so I firstly convert it to .guff file by llama.cpp's convert_hf_to_guff.py,but I got an error that KeyError "",so I found it not in added_tokens_decoder of tokenizer_c…