-
I was quantizing weights using :
```
graph.openStore(
full_f16_path, flags: .truncateWhenClose
) { store in
let keys = store.keys
graph.openStore(
f8_path,
flags: .truncateWhen…
ghost updated
1 month ago
-
Hi,
This error occurred when I tried to quantize my onnx model.
```
Traceback (most recent call last):
File "quant.py", line 4, in
quantize(
File "/usr/local/lib/python3.8/dist-packages…
-
File "/home/qx/.local/lib/python3.10/site-packages/awq/models/base.py", line 231, in quantize
self.quantizer.quantize()
File "/home/qx/.local/lib/python3.10/site-packages/awq/quantize/quantize…
-
## 🐛 Bug
![image](https://github.com/user-attachments/assets/5253f5fc-8cbb-4b9f-8e33-674865f09164)
## To Reproduce
Steps to reproduce the behavior:
1. Download Command-R-Plus (either varia…
-
### What happened?
I was running gguff on https://huggingface.co/pints-ai/1.5-Pints-16K-v0.1/tree/main
It's bog standard llama. It should have quanted but it's failing.
### Name and Version
What…
-
**When I quantized the model of Qwen2-7B (not fine-tuned) using the quantization code below, I got the following error:**
**quantization code**
```python
from awq import AutoAWQForCausalLM
from tr…
-
### Idea
Use int4 as the compression technique to fit larger models onto Navi machines or possibly MI series machines. Weights would be compressed using encoding scheme that would pack two 4 bits n…
-
Hi @PanQiWei
I'd be most grateful if you could give me a bit of help.
I have been trying to quantize BLOOMZ 175B but can't currently get it done. BLOOMZ has 70 layers, and is a total of 360GB.…
-
Hello!
I did some research (using llama.cpp) and I found out that quantizing the input and embed tensors to f16 and the other tensors to q5_k or q6_k gives excellent results and almost indistinguisha…
-
Package Version:
AutoAWQ: 0.2.5+cu118
torch: 2.3.1+cu118
transformers: 4.43.3
I was try to quantize my finetuned llama3.1 405b (bf16) model to 4 bit using autoawq following the insturction in t…