-
### Description
With int8 & int4 and any further quantization schemes we will provide, it is possible that to achieve adequate recall, some oversampling & rescoring with the raw float32 vectors might…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…
-
Right now ao works just fine to quantize an arbitrary HF model
However this simple workflow is failing meaning we don't really interop well with the rest of the HF ecosystem
```python
from tr…
-
- A16W4 axis=1
- Low hanging fruit we can add to int4wo quant as either a flag or replace the quant method
- [x] test eval with HQQ axis=1 and compare to existing version
- if axis…
-
I'm currently exploring embedding quantization strategies to enhance storage and computation efficiency while maintaining high accuracy. Specifically, I'm looking at integrating these strategies with …
-
After successful quantizing and exporting ONNX models for ResNet18, using 2 different mode `int8` and `fp8`, I am trying to export these ONNX models to TRT, but no luck so far. It returns Error No sup…
-
Use this script to build int8 but failed: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…
-
Hi again,
I've successfully quantized an onnx model to int8, then converted to tensorrt engine and noticed the performance increase compared to fp16.
```bash
python -m modelopt.onnx.quantizati…
-
I used mtq.INT8_default_CFG as recommended for CNN networks (mtq.quantize(model, config, forward_loop). My initial model ran at 80FPS after quantization it dropped to 40FPS? I checked the model struct…