google-ai-edge / ai-edge-torch

Supporting PyTorch models with the Google AI Edge TFLite runtime.
Apache License 2.0
378 stars 51 forks source link

Looking for selective post training quantization for 8 bit weights and 16 bit activations #395

Open gaikwadrahul8 opened 4 days ago

gaikwadrahul8 commented 4 days ago

System information

TensorFlow version (you are using): TF 2.13.0 Are you willing to contribute it (Yes/No): No Describe the feature and the current behavior/state.

Dear TF developers, I'm currently experimenting with PTQ using 8 bit weights and 16 bit activations (W8A16), and I've gotten great results. However, after some experimentation I have identified that only a certain part of my network requires the 16 bit activations. In other word, using 16 bit activations for the entire model is sub-optimal for my use-case.

Hence, I'm looking for a way to selectively quantize a part of my model to 8 bit weights and activations (W8A8), and the other part to W8A16.

In the current state, would this be possible somehow ?

Who will benefit with this feature? Platforms that support mixed-precision execution of activations.

Any Other info.

gaikwadrahul8 commented 4 days ago

This issue originally reported by @Hrayo712 has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.

We appreciate your understanding and look forward to your continued involvement.