Open gaikwadrahul8 opened 4 days ago
This issue originally reported by @Hrayo712 has been moved to this dedicated repository for ai-edge-torch to enhance issue tracking and prioritization. To ensure continuity, we have created this new issue on your behalf.
We appreciate your understanding and look forward to your continued involvement.
System information
TensorFlow version (you are using): TF 2.13.0 Are you willing to contribute it (Yes/No): No Describe the feature and the current behavior/state.
Dear TF developers, I'm currently experimenting with PTQ using 8 bit weights and 16 bit activations (W8A16), and I've gotten great results. However, after some experimentation I have identified that only a certain part of my network requires the 16 bit activations. In other word, using 16 bit activations for the entire model is sub-optimal for my use-case.
Hence, I'm looking for a way to selectively quantize a part of my model to 8 bit weights and activations (W8A8), and the other part to W8A16.
In the current state, would this be possible somehow ?
Who will benefit with this feature? Platforms that support mixed-precision execution of activations.
Any Other info.