-
Hi, I am trying to use ipex to quantize unet model following https://github.com/intel/intel-extension-for-pytorch/blob/v1.12.0/docs/tutorials/features/int8.md.
Now the model can be quantized, but the…
-
To add quantization support to KV cache, into state dict.
Static (as activation) is needed for performance.
Dynamic can be added for completeness.
-
I see that there is full int8 support (both weights and activations) for BERT, its not clear to me what is supported for GPT models ([here](https://github.com/NVIDIA/FasterTransformer/blob/main/exampl…
-
While testing for OPT with `quant_lm_head=True`, here are the result weights post quantize:
`weight keys: ['lm_head.g_idx', 'lm_head.qweight', 'lm_head.qzeros', 'lm_head.scales', 'model.decoder.em…
-
We encourage you to join the [MLX Community](https://huggingface.co/mlx-community) on Hugging Face 🤗 and upload new MLX converted models and versions of existing models.
awni updated
6 months ago
-
### 🐛 Describe the bug
## Description
The output of fully quantized and fake quantized models do not match, with the fully quantized model not matching the expected analytical results for a minima…
-
### Describe the bug
#### A clear and concise description of what the bug is.
using quant only makes minimal speedups on A100
### Your environment
#### OS
$ uname -a
Linux jean-zay4 4.18.…
-
Hi,
without using transformers / accelerate blablabla, what are the constraints on the model to be tensor paralelizable ?
does it need to be a nn.Sequential ? does input dimensions need to be alwa…
-
## 🐛 Bug
Seeing errors when trying to trace simple models based on `nn.Sequential`:
```
Traceback (most recent call last):
File "/home/vasiliy/nfs/pytorch_scripts/gm_sequential_bug.py", line…
vkuzo updated
3 weeks ago
-
When i quantize a model, the avg loss is lower in earlier layers(0.02) than the loss in later layers(2.0), i'm curious that if the quantization is failed due to a large avg loss?
And for experience, …
ehuaa updated
1 month ago