-
I am trying to execute the following script:
1. from llama_cpp import Llama
2. llm = Llama(model_path="~/llama-2-7b.ggmlv3.q8_0.bin", n_gqa=8)
3. output = llm("Q: Name the planets in the solar sy…
-
相关说明:大语言模型的推理成本很高,主要是由于加载键和值的内存带宽开销。分组查询注意(Grouped-Query Attention,GQA)是一种多查询和多头注意力的插值,它以与多查询注意力相当的速度实现了接近多头的质量。
论文链接:https://arxiv.org/pdf/2305.13245.pdf
-
# URL
- https://arxiv.org/abs/2305.13245
# Affiliations
- Joshua Ainslie, N/A
- James Lee-Thorp, N/A
- Michiel de Jong, N/A
- Yury Zemlyanskiy, N/A
- Federico Lebrón, N/A
- Sumit Sanghai, …
-
Can trained models be provided, especially on the GQA dataset.
-
### Describe the bug
Model URL:
https://huggingface.co/bartowski/Hubble-4B-v1-GGUF/discussions/1
llama_model_loader: - kv 26: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ"…
-
Why is this project code reporting errors in the CLEVR dataset?
The question is:
Traceback (most recent call last):
File "/mnt/public/home/s-xuk/mcan-gqa/run.py", line 160, in
execution.run…
QA-x updated
11 months ago
-
I want to convert this small 1.1B llama2 architecture model [PY007/TinyLlama-1.1B-intermediate-step-240k-503b](https://huggingface.co/PY007/TinyLlama-1.1B-intermediate-step-240k-503b) to llama2.c vers…
-
Hi,
Is it possible to provide the details on how the first version was evaluated on benchmarks such as GQA or AOK-VQA in Table 6 of the paper?
Thanks
-
试过了lora(可能没有lora的很好),试过了冻结部分权重。请问下作者有建议吗?
-
Hello, could you please tell me the training time of hico and gqa in which gpu, thanks!