-
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", (--deployment com…
-
I am trying the vlm_ptq by following the readme in vlm_ptq folder, and when I call a command "scripts/huggingface_example.sh --type llava --model llava-1.5-7b-hf --quant fp8 --tp 8", following error m…
-
**Describe the bug**
Attempting to save PTQ `TorchVision` models using the `ptq_benchmark_torchvision.py` script after amending the script to save the model using `export_torch_qcdq` as a final ste…
-
Hi, I love your great tutorials.
I have studied many SOTA PTQ papers for ViT like I-ViT but I found all PTQ papers are based on simulation (FakeQ)
I want to deploy that kind of external PTQ implemen…
-
### Right Case
When I follow the doc : https://github.com/pytorch/executorch/blob/main/examples/models/llama/README.md#enablement,
I export the Llama3.2-1B-Instruct:int4-spinquant-eo8 model to xnnpa…
-
### 🚀 The feature, motivation and pitch
Currently qnn quantizer only supports PTQ (post training quantization), and we'd like to enable QAT (quantization aware trainning) for better quantization supp…
-
### 💡 Your Question
I have followed exactly same steps for model training followed by PTQ and QAT mentioned in the offcial super-gradient notebook :
https://github.com/Deci-AI/super-gradients/blob…
-
**Describe the bug**
I tried to optimize BERT model with bert_ptq_cpu.json but it gave 7 output models.
It there any ways or change the config to get only one output model?
```
[2024-10-25 10:54:59,1…
-
Converting this dummy model with quantize_target_type="int8" and per_tensor=True throws an error in tflite
```python
import torch.nn as nn
import torch
from tinynn.graph.quantization.quantizer …
-
### 🚀 The feature, motivation and pitch
I am trying to implement eager mode of PT2E quantization on CPU. Currently, the PT2E quantization on CPU is lowered to Inductor by `torch.compile`. The current…