-
Loading saved model runs into following error
It also takes a very long time to run and save quantized models.
```
2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.sa…
-
**Describe the bug and context**
I'm trying to quantize an optimized Stable Diffusion model.
I got to know that `IncDynamicQuantization` has less reduction in inference speed than `OnnxDynamicQuanti…
-
I evaluated all of the classifications models according to their preprocessing description with imagenet:
Models:
-----------
- squeezenet1.0-12.onnx
- bvlcalexnet-12.onnx
- caffenet-12.onnx
…
-
As mentioned in this paper - TEQ: Trainable Equivalent Transformation for Quantization of LLMs.
The authors of this paper are claiming - "The training process is lightweight, requiring only 1K steps …
-
Hi team, I am having issue quantizing the network consisting of Conv and Linear layers using **int8** weights and activations in ONNX. I have tried setting it using op_type_dict, however it doesn't wo…
-
### Add Link
https://pytorch.org/tutorials/recipes/intel_neural_compressor_for_pytorch.html
### Describe the bug
Follow the tutorial, I write this code, and find that the segmentation fault occur w…
-
I followed the quick start guide, and an error occurred when I tried to run the python script. It seems to be an dependency error. I searched the internet and did not find any solution. How to solve t…
-
https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/weight_only
bash run_quant.sh --input_model=./Meta-Llama-3.1-8B -…
-
The below PostTrainingQuantConfig produces fp32 ops for NPU using 2.4.1. Models with int8 and fp16 ops would be preferred for NPU.
conf=PostTrainingQuantConfig(quant_level='auto',
device='n…
-
### Describe the issue
I am using the Int8 quantized version of BGE-reranker-base model converted to the Onnx model. I am processing the inputs in batches. Now the scenario is that I am experiencing …