-
Great work going on with GGML. Bravo to so many contributors. You are champions!
Maybe more performance (on CPU) can be had with bringing sparsity into the workflow. Here is one of the many efforts…
-
Hello, I have tried lots of different version combinations to make the LLaMA script work, it produces very bad results which is
also what I observed with my own implementation and some other implemen…
-
Hi @Eric-mingjie,
I am also facing the same issue (as [#51]) when trying to prune llama-2-7b-chat-hf
Here's the command
`python main.py --model meta-llama/Llama-2-7b-chat-hf --prune_method…
-
Hi! Thanks for your great work!
Im a little confused about the implementation. Your simple and efficient work only requires once forward caculate to get the activation of each layer. [This line](ht…
-
When I try the SparseGPT, it raises the error:
┌───────────────────── Traceback (most recent call last) ─────────────────────┐
│ E:\pythonProject\pruning.py:77 in │…
-
Hi,
I was wondering if you plan to put in a public domain the sparsified Llama2 models. In particular I am interested in the Llama2-70B with 50% unstructured sparsity.
Thanks!
egeor updated
9 months ago
-
With the increasing interest in using this library to train models originally trained by others (https://github.com/EleutherAI/gpt-neox/issues/896 https://github.com/EleutherAI/gpt-neox/issues/994 htt…
-
-
# repo链接
https://github.com/THUDM/ChatGLM-6B
https://github.com/mymusise/ChatGLM-Tuning
https://github.com/LianjiaTech/BELLE
## LLM量化
https://zhuanlan.zhihu.com/p/616969812
- [SmoothQuant](htt…
-
Is it possible to use this in llama 2? I'm interested in improving the inference speed so the accuracy loss doesn't matter right now
BDHU updated
11 months ago