-
spinquant: https://arxiv.org/abs/2405.16406
-
Background:
The [spin quant paper](https://arxiv.org/pdf/2405.16406) introduces a method of improving quantization by adding additional rotation matrices to the model weights that improve quantizatio…
-
### 🐛 Describe the bug
Currently I'm trying to test LLaMA 3.2 1B Instruct Model as you guided.
I was done to test LLaMA2 7B / LLaMA2 3 8B with XNNPACK @ On Device side.
I faced some issues du…
-
Looks like Read me file having the description and commands to run for SpinQuant not LLM QAT.
-
Hi, I appreciate your work! I have a question regarding the training cost. In the introduction it's mentioned that the cost is 1.3 hours on a single A100, but section 4.1 mentions 8 A100s. Could you c…
-
Hi, I appreciate your work! I have a question regarding the training cost. In the introduction it's mentioned that the training cost of LLaMA-2 7B is 1.3 hours on a single A100, but section 4.1 mentio…
-
While using the right hidden size for the rotation,
Llama 3 8B model perform better
WIKITEXT2 PPL: 11.544 > WIKITEXT2 PPL: 8.967
But, for the other models, while running the fake quant:
`pyt…
-
## Problem
We don't publish aarch64 linux binaries so right now we still install ao=0.1
```
(myvenv) marksaroufim@rpi5:~/Dev/ao $ pip install torchao
Looking in indexes: https://pypi.org/simpl…
-
model : Llama 3.2 1B (without any quantization)
I ran the app, loaded the model, and entered the input, but the following error appears in the middle of the output.
```shell
2024-10-16 17:11:17.7…