Looking for selective post training quantization for 8 bit weights and 16 bit activations

System information

TensorFlow version (you are using): TF 2.13.0 Are you willing to contribute it (Yes/No): No Describe the feature and the current behavior/state.

Dear TF developers, I'm currently experimenting with PTQ using 8 bit weights and 16 bit activations (W8A16), and I've gotten great results. However, after some experimentation I have identified that only a certain part of my network requires the 16 bit activations. In other word, using 16 bit activations for the entire model is sub-optimal for my use-case.

Hence, I'm looking for a way to selectively quantize a part of my model to 8 bit weights and activations (W8A8), and the other part to W8A16.

In the current state, would this be possible somehow ?

Who will benefit with this feature? Platforms that support mixed-precision execution of activations.

Any Other info.

google-ai-edge / ai-edge-torch

Looking for selective post training quantization for 8 bit weights and 16 bit activations #395