-
Is it possible to perform `Quantization Aware Training` on Sentence Transformers, beyond [fp16 and bf16](https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py#L404-L4…
-
I find that the quantisation losses are higher for GPTJ than LLama which seems to stay pretty low.
```
2023-06-20 19:05:19 INFO [auto_gptq.modeling._base] Quantizing attn.q_proj in layer 2/28...
…
ri938 updated
2 months ago
-
Hi,
Have you tried quantizing Mamba? Do you plan on releasing quantized versions?
Can you share your thoughts on quantizing Mamba, given the sensitivity of the model's recurrent dynamics?
Thanks
-
Hello,
I have used your QAT model to quantize to different bitwidths, but I saw that the quantizations were always to FP values, even if they were quantized (e.g., if I quantized to 4bit, then all my…
-
Dear Authors,
Thanks for the great job.
After installing "ryzen-ai-1.2.0-20240726.msi", I can run with NPU under the target platform.
However, there are some questions I would like to verify.
…
-
Package Version:
AutoAWQ: 0.2.5+cu118
torch: 2.3.1+cu118
transformers: 4.43.3
I was try to quantize my finetuned llama3.1 405b (bf16) model to 4 bit using autoawq following the insturction in t…
-
I found that the device memory usage keeps increasing when execute basic_quant_mix.py, it will raise OOM when model has large parameters, so, how to optimize it. Thank you~
@Qcompiler
-
Hi,
First of all many many thanks for this device, is amazing, and just thanking you for this brick you have added to this massive construction set called Ableton.
Just an issue, is there a way …
-
I am getting "float division by zero" error whenever I try to quantize mixtral related models with autogptq,
and here is my code.
```
from transformers import AutoTokenizer, TextGenerationPipeli…
-
#**Describe the bug**
Quantization scales are defined to _always_ be positive in the [onnx documentation](https://iot-robotics.github.io/ONNXRuntime/docs/performance/quantization.html).
Creating a qd…