-
Fp8 or AWQ quant
-
https://huggingface.co/black-forest-labs/FLUX.1-Fill-dev/blob/main/flux1-fill-dev.safetensors
-
Hi all,
We recently developed a fully open-source quantization method called VPTQ (Vector Post-Training Quantization) [https://github.com/microsoft/VPTQ](https://github.com/microsoft/VPTQ) which en…
-
**Describe the bug**
I'm compressing a qwen2.5_7b model using `examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py`, but I failed to load the stage_sparsity model. The error is shown belo…
-
Hello, I'm trying to use AIMET_TORCH to quantize a LLM model, e.g.: llama v2。 where can I find a jupyter NB example which shows quantization simulation for a LLM model?
-
### 🚀 The feature, motivation and pitch
In the past, we padded int4 quantization with non-multiple group size to make things work. Since we have decided to remove the padding, int4 quantization is n…
-
I fine-tuned a Whisper large-v3 model via [speechbrain](https://github.com/speechbrain/speechbrain) framework. I want to convert it to `faster-whisper` model and run inference on it via `faster-whispe…
-
Quark is a comprehensive cross-platform toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, Quark empowers developers to optimiz…
-
### 💡 Your Question
I have followed exactly same steps for model training followed by PTQ and QAT mentioned in the offcial super-gradient notebook :
https://github.com/Deci-AI/super-gradients/blob…
-
Hi @shewu-quic ~
Could you tell me after which method the actual physical size of the model reduces when we perform 8a8w quantization on the Llama-3.2-1B & 3B models using a QNN backend?
Thank …