neural-compressor Search Results

intel/intel-extension-for-transformers #1695

Error in import: ModuleNotFoundError: No module named 'neura…

I have installed **intel-extension-for-transformer** using `pip install intel-extension-for-transformers` but trying a little script to see if it worked I got this error : Traceback (most recent c…

Nicogs43 updated 1 week ago

intel/neural-compressor #1980

how to evaluate AWQ ?

https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md#examples how to set eval_func? https://github.com/intel/neural-compressor/blob/master/examples/3…

chunniunai220ml updated 1 week ago

he-y/Awesome-Pruning #46

Intel Neural Compressor

Hello, The biggest and most important library is not mentioned. https://github.com/intel/neural-compressor

iamanigeeit updated 7 months ago

intel/neural-compressor #1600

Unable to save llama2 after SmoothQuant

Hi all, I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llam…

dellamuradario updated 6 months ago

intel/neural-compressor #1609

AWQ quantization is very slow for ONNX LLMs

I'm not sure if I'm missing an option somewhere, but AWQ quantization for large ONNX models is very slow. When quantizing a 7B LLaMA model, the 4 following `np.matmul` calls take forever to execute, a…

PatriceVignola updated 6 months ago

intel/neural-compressor #1889

PTQ with IPEX backend and XPU device is not working

Hi all, I have been trying to apply **post-training-quantization** to a custom vision model (pretrained vgg16 model) which I have already finetuned using "xpu" (Intel GPU Max Series). I have saved …

paguilomanas updated 1 month ago

intel/neural-compressor #1531

Quantized Neural compress model not generating expected resu…

Hi Team, I have converted a norma t5 small model to Onnx using onnxruntime 1.15.1, python =3.10.12 in Intel Processor and AMD processor but received different response! Please let me know how to us…

Bhuvaneswaran-R updated 4 months ago

intel/neural-compressor #1621

neural_compressor/adaptor/ox_utils/quantizer.py dfs crash du…

When I use "basic" strategy tuning to quantize my model, I ran into this issue during one of the phases: ``` ... 2024-02-21 23:25:49 [INFO] Tune 73 result is: [Accuracy (int8|fp32): 0.0035|0.0000…

kmn1024 updated 6 months ago

intel/intel-npu-acceleration-library #111

Why Llama series int4 quantization has fp16 attention layers…

**Describe the bug** When loading TinyLlama or Llama-3-8B with dtype=int4, the model structure looks: ``` LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) …

kyang-06 updated 2 weeks ago

huggingface/optimum-habana #1240

AWQ is not working

### System Info ```shell Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana ValueError: AWQ is only available on GPU ``` #…

endomorphosis updated 2 weeks ago

371 results for neural-compressor

371 results
for neural-compressor