-
We aim to implement a system that leverages distillation and quantization to create a "child" neural network by combining parameters from two "parent" neural networks. The child network should inherit…
-
Hello authors,
Thank you for your excellent work.
I've tried utilizing AIMET to resolve a severe performance degradation issue caused by quantization while using the SNPE library. However, I've …
-
## Description
Hi,
I have been using the INT8 Entropy Calibrator 2 for INT8 quantization in Python and it’s been working well (TensorRT 10.0.1). The example of how I use the INT8 Entropy Calibra…
-
Hi,
Ternary quantization has become popular and has demonstrated computational speedups and power reductions, as demonstrated in works like llama.cpp and [bitnet.cpp](https://github.com/microsoft/B…
-
# TensorRT Model Optimizer - Product Roadmap
[TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) (ModelOpt)’s north star is to be the best-in-class model optimization toolki…
-
Hello, Onnx runtime development team.
Let us ask the question about quantization of batch normalization.
We use onnx runtime 1.9.0, static quantization.
If we use "Network A", Batch Normalizatio…
-
## 🐛 Bug
I'm looking at generating a int8 quantised PyTorch model (both weights and activations at int8), and exporting to StableHLO via `torch-xla`'s `exported_program_to_stablehlo`.
Right no…
-
### Is your feature request related to a problem?
After documents are ingested by **text_embedding** processor, an array of float32 type per **knn_vector** field is stored in segments.(hnsw or ivf)
…
-
This is a issue to collect what additional "network health" related data we would want to have available in the app api.
There is currently some network data exposed to the app api via the [network…
-
Self-Compressing Neural Networks is dynamic quantization-aware training that puts the size of the model in the loss
Paper: https://arxiv.org/pdf/2301.13142
Code: https://github.com/geohot/ai-noteb…