-
Hi,
I don't know anything ML... but I've been using Tract as part of a benchmark suite to look at WebAssembly performance. Today I've tried running quantized models, but I got an error:
`Transla…
-
Hello authors,
Thank you for your excellent work.
I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've …
-
## ❓ Question
I have a PTQ model and a QAT model trained with the official pytorch API following the quantization tutorial, and I wish to deploy them on TensorRT for inference. The model is metaforme…
-
### System Info
transformers.js 2.17.2
### Environment/Platform
- [X] Website/web-app
- [ ] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron)
- [ ] O…
-
**Describe the bug**
When I run the example from examples/python/awq-quantized-model.md, but switching out phi-3 for llama-3.2-3b, I get an error message stating that `AttributeError: 'NoneType' objec…
-
### Feature request
Hi, I've created a 4-bit quantized model using `BitsAndBytesConfig`, for example
```
from transformers import AutoModelForTokenClassification, BitsAndBytesConfig
from optim…
-
**Describe the bug**
1) Missing step to download the examples, add step to the set-up section to first run:
```
cd olive
py -m pip install -r requirements.txt
```
2) then there are examples like for …
-
### Describe the issue
With QNN execution provider we see that loading the first model ~800 MB of memory is allocated and after loading each model, ~100MB of memory is allocated again. When destroyin…
-
The current proposal has support for quantized types like `tensor-quant8-asymm` and some operators support them. Many networks run in mixed precision i.e. quantized output matrix multiply followed by…
-
### Describe the issue
Hi!
I've been building ORT using the command and noticed binary operators like _Add_ are being executed by the Eigen library, I did some debugging and noticed Eigen is using t…