-
Can activation quantization also be introduced in Hqq as well? Or if not, is there any process/method can further quantize the activation after using Hqq to quantize the weight?
-
#### I am using Onediff - Controlnet, load model of float16.
#### In your introduction, you used onediff int8, which is very effective in accelerating the model. I want to know if this is applicable…
-
Hi,
Thank you for the design code. I just want to know whether your design uses INT8 quantization and MAC operations or everything is happening in FP32.
Thanks
-
Hi! I'm trying to quantize MobileNetV3 with tflite, but int8-model performs very poor. I think, it is because of linear quantization, which is too simple method not appropriate for any weights distrib…
-
### System Info
0.0.40 shipped the first version of embedding quant.
`--embedding-dtype int8`
This Issue is looking for testers, to verify the real life performance of these features at real da…
-
Hi there,
I'm new to quantization. From my understanding, "8da4w" means that the weights are pre-quantized to 4 bits, and the activations are quantized to 8 bits at runtime. Following this, the GEM…
-
INT8 quantization works fine, but INT4 does not work.
![Capture](https://github.com/pytorch-labs/gpt-fast/assets/106262476/ac10df53-860e-4da9-b51e-1ad17e3fe3c4)
-
Repro command:
```
python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth
```
Errors:
```
(pt) [ybliang@devgpu002.ash8 ~/local/gpt-fast (main)]…
-
Hi,
This error occurred when I tried to quantize my onnx model.
```
Traceback (most recent call last):
File "quant.py", line 4, in
quantize(
File "/usr/local/lib/python3.8/dist-packages…
-
### System Info
CPU Architecture: x86_64
CPU/Host memory size: 1024Gi (1.0Ti)
GPU properties:
GPU name: NVIDIA GeForce RTX 4090
GPU mem size: 24Gb…