-
Hi Everyone,
i built an NN with BatchNormalization layer and i have tried to quantize the whole model for EdgeTPU application. I have read that i can use this layer after Dense or Conv2D layer in t…
-
**What is your question?**
Does cuvs support build the index with the dataset size larger than GPU memory?
Also does cuvs support multigpu building?
-
### Summary
Last year, we released [pytorch-labs/torchao](https://github.com/pytorch-labs/ao) to provide acceleration of Generative AI models using native PyTorch techniques. Torchao added support …
-
Hello,
Is it possible to obtain a quantized .tflite version of YOLO v3 / YOLO Tiny v3 to do INT8 inference with the tools in this repository? I've tried using TensorFlow Lite's official tool, `toco`,…
-
Hi,
I've been working with hls4ml to synthesize a model for the ZCU104. After quantization aware training with qkeras (2 bits), I convert the model to hls4ml and run the synthesis. However, when I…
-
Hi,thanks for your excellent work.
after train model with LSQplus, how to export S/Z files? like aimet export .encondings file? then be used in snpe
-
The current version of mxnet just provide register_forward_hook() function. However, register_backward_hook function is also useful (e.g. logging the information of gradient w.r.t the block or overwri…
JFChi updated
5 years ago
-
Are there plans to add flash attention and also flash decoding to allow for improved performance for long context?
-
Hi!
Thank you for the paper! It is inspiring that you can compress weights to about 1 bit and the model still works better than random.
A practical sub-2-bit quantization algorithm would be a grea…
-
Hi, when I am trying Quantization Aware Training on my model, I get the following error in my 'CustomLayerMaxPooling1D' :
---------------------------------------------------------------------------
…