-
I have installed **intel-extension-for-transformer** using `pip install intel-extension-for-transformers` but trying a little script to see if it worked I got this error :
Traceback (most recent c…
-
https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples
how to set eval_func?
https://github.com/intel/neural-compressor/blob/master/examples/3…
-
Hello,
The biggest and most important library is not mentioned. https://github.com/intel/neural-compressor
-
Hi all,
I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llam…
-
I'm not sure if I'm missing an option somewhere, but AWQ quantization for large ONNX models is very slow. When quantizing a 7B LLaMA model, the 4 following `np.matmul` calls take forever to execute, a…
-
Hi all,
I have been trying to apply **post-training-quantization** to a custom vision model (pretrained vgg16 model) which I have already finetuned using "xpu" (Intel GPU Max Series). I have saved …
-
Hi Team,
I have converted a norma t5 small model to Onnx using onnxruntime 1.15.1, python =3.10.12 in Intel Processor and AMD processor but received different response! Please let me know how to us…
-
When I use "basic" strategy tuning to quantize my model, I ran into this issue during one of the phases:
```
...
2024-02-21 23:25:49 [INFO] Tune 73 result is: [Accuracy (int8|fp32): 0.0035|0.0000…
-
**Describe the bug**
When loading TinyLlama or Llama-3-8B with dtype=int4, the model structure looks:
```
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(128256, 4096)
…
-
### System Info
```shell
Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana
ValueError: AWQ is only available on GPU
```
#…