-
Hi, I failed to quant onnx(weight stored with fp16) to int8 because of overflow.
The following code is from modelopt.onnx.quantization.ort_patching
```python
threshold = max(abs(min_value), a…
-
The inference speed of the int 8 quantization version of SDXL is much slower than that of fp16. I am runing trt9.3 sdxl demo and here is the result. (I changed shape to 768x1344 manually)
fp16 : pyt…
-
The training process is quite slow, whereas using 8-bit hqq speeds it up by more than tenfold. Is this normal? Or have I missed any code?
```python
import torch
from transformers import EetqConfi…
-
Is there any plan to add int8 quantization support on GPU for gpt2 or other transformer models? Thanks
-
### Describe the issue
Now I'm replicating this [implementation,](https://intel.github.io/intel-extension-for-pytorch/llm/cpu/#compile-from-source)
pytorch=2.1.0.dev20230711+cpu
intel-extension-for…
-
### System information
Linux OpenSuse Tumbleweed
- TensorFlow installation : pip
- TensorFlow library : Tf-nightly, occurs on earlier versions too
### Code
Converting a model containing an …
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS…
-
I tried running ipex.optimize followed by tracing/scripting. I am not able to see any fusion groups in IR (torch.jit.last_executed_optimized_graph()). Is there any way to get the fusion groups other t…
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Win 10 22H2 (but reproducible elsewhere)
- TensorFlow installation (pip package or built from source): pip pack…
DLumi updated
5 months ago
-
Hi, I read the docs about `zero_quant`, but it seems to require extra training.
And in `deepspeed.init_inference`, the `dtype` can be set to int8, but the code does nothing for int8. https://github…