quantized-onnx-models Search Results

sonos/tract #1570

QLinearAdd support

Hi, I don't know anything ML... but I've been using Tract as part of a benchmark suite to look at WebAssembly performance. Today I've tried running quantized models, but I got an error: `Transla…

sparker-arm updated 2 weeks ago

quic/aimet #3438

Asking for a guide on quantization process utilizing SNPE af…

Hello authors, Thank you for your excellent work. I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've …

chewry updated 2 weeks ago

pytorch/TensorRT #3267

❓ [Question] How do you properly deploy a quantized model wi…

## ❓ Question I have a PTQ model and a QAT model trained with the official pytorch API following the quantization tutorial, and I wish to deploy them on TensorRT for inference. The model is metaforme…

Urania880519 updated 3 weeks ago

huggingface/transformers.js #979

Can't create a session (local model)

### System Info transformers.js 2.17.2 ### Environment/Platform - [X] Website/web-app - [ ] Browser extension - [ ] Server-side (e.g., Node.js, Deno, Bun) - [ ] Desktop app (e.g., Electron) - [ ] O…

djannot updated 1 month ago

microsoft/onnxruntime-genai #1089

awq example runs into error with llama 3.2 3b due to embeddi…

**Describe the bug** When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that `AttributeError: 'NoneType' objec…

tranlm updated 1 week ago

huggingface/optimum #1914

ONNX export support for 4-bit quantized models

### Feature request Hi, I've created a 4-bit quantized model using `BitsAndBytesConfig`, for example ``` from transformers import AutoModelForTokenClassification, BitsAndBytesConfig from optim…

ideasbyjin updated 1 month ago

microsoft/Olive #1475

Stable Diffiusion example does not exist at all

**Describe the bug** 1) Missing step to download the examples, add step to the set-up section to first run: ``` cd olive py -m pip install -r requirements.txt ``` 2) then there are examples like for …

BrickDesignerNL updated 2 weeks ago

microsoft/onnxruntime #22511

[Performance] C++ api: destroy the execution provider if the…

### Describe the issue With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroyin…

kristoftunner updated 1 week ago

webmachinelearning/webnn #93

Add QuantizeLinear and DequantizeLinear for mixed precision

The current proposal has support for quantized types like `tensor-quant8-asymm` and some operators support them. Many networks run in mixed precision i.e. quantized output matrix multiply followed by…

kpu updated 3 weeks ago

microsoft/onnxruntime #22905

[Performance] Binary operators using SSE on AVX systems

### Describe the issue Hi! I've been building ORT using the command and noticed binary operators like _Add_ are being executed by the Eigen library, I did some debugging and noticed Eigen is using t…

eralmual updated 4 days ago

1000+ results for quantized-onnx-models

1000+ results
for quantized-onnx-models